Connecting containers on several hosts with Open vSwitch

Lxc is great for starting up several containers on your laptop or on an ec2 host. But what if you want to fire up containers on multiple ec2 instances, and have them talk to each other?

An easy way to support that is using openvswitch. This script is a user-data-script which you can use to fire up instances ready to connect containers. For instance, I personally would

ami=`ubuntu-cloudimg-query precise`
ec2-run-instances -n 2 -f user-data-lxc-ovs.sh -k mykeypair $ami

This will fire off two Ubuntu precise instances which will run the script. Once the scripts are done (sudo status cloud-final will show stopped), you can look at the openvswitch bridge with

sudo ovs-vsctl show

You want to connect the bridges on each instance by adding a GRE tunnel. On each host, do

sudo ovs-vsctl add-port br0 gre0 -- set interface gre0 type=gre options:remote_ip=x.x.x.x

where x.x.x.x is the public ip address of the other instance. Now the tunnel is set up. You can simply fire up container p1 in each instance

sudo lxc-start -n p1

Check out the /etc/lxc/lxc-ovs.conf file on each instance, which was the lxc configuration used to create the containers. It has two network sections (each started by lxc.network.type=veth). The first will be veth0, and will be connected to the lxcbr0 to connect the container to the internet. The second will be veth1, which is bridged with the openvswitch GRE tunnel. So the containers can ssh to each other’s veth1 addresses.

About these ads
This entry was posted in Uncategorized and tagged , . Bookmark the permalink.

22 Responses to Connecting containers on several hosts with Open vSwitch

  1. Pingback: Connecting containers on several hosts with Open vSwitch « thoughts…

  2. Dimitris says:

    Hi,
    There is one point I don’t understand. Are you running ovs-switch inside the containers also ? where is the br0 instance initiated ? on the host or in the container ?

    Cheers,
    Dimitris

    • s3hh says:

      br0 is on the host. the container just has two nic’s, eth0 and eth1, which are endpoints of two separate veth tunnels. the other endpoint of the one tunnel is connected to the ovs bridge, the other to the lxcbr0 bridge.

  3. Thanks for the walk-through. I’m successfully able to communicate between LXC containers running on two VMs.

    Any advice for further reading – I’m trying to figure out how one would add a 3rd VM.

    • s3hh says:

      Sorry, the references seem pretty scarce out there. I’m pretty sure to add a 3d VM you need to add all the links – in other words, the number of links would scale pretty badly. 2 links for 2 VMS, 6 for 3 VMs, 12 for 4…

      Please do let me know if you find some good documentation.

      • While you can add all the links (and it *might* help the controller determine more efficient routes) – I don’t think you have to add links between every pair. (the math from before)

        I’m pretty certain as long as a path between hosts exists, you can tranverse it even if there are multiple hops.. example time:

        launch 3 VMs. In each VM do everything except creating the GRE tunnels. Let’s name our VMs A,B,C.

        Then:
        * create a single GRE tunnel on A with the remote_ip of B
        * create a single GRE tunnel on C with the remote_ip of B
        * create two GRE tunnels on B with remote_ip of A and C.

        Then you should be able to lxc-console into the containers and ping between all pairs. Any packets going from container A to container C will have to travel through B, which on my setup means roughly double ping response time.

  4. s3hh says:

    Thanks, Jesse, good to know!

  5. Anirup Dutta says:

    I have a question suppose, I start a lxc container with a certain configuration. I then shut it down and I want to start it with a different configuration. For example, in this post, we are assigning a ip address based on the ami index. Suppose I want to change it later. How do I do it? I tried starting the container using lxc-start with the f option but it didn’t work.

    • s3hh says:

      What do you mean by it didn’t work? I’d suggest sending an email with precisely what you did and what happened to the lxc-users mailing list, as what you want certainly should be possible.

  6. hey although i know that the script is for just testing purposes this would be a great addition
    Add the down line here (should be good beyong lxc 0.8.04 or something)
    “””
    lxc.network.script.up=/etc/lxc/ovsup
    lxc.network.script.down=/etc/lxc/ovsdown
    “””
    and the script
    “””
    cat > /etc/lxc/ovsdown << EOF
    #!/bin/bash

    ifconfig \$5 0.0.0.0 down
    ovs-vsctl del-port br0 \$5
    EOF
    """

    • s3hh says:

      Hey, thanks for your comment. Now, by default, since the one veth is in the container’s namespace, what should happen is the veth pair should disappear when the container goes down. What you add could be worth adding just as instruction in case people want to add something else, and it won’t harm anything, but am I overlooking some reason why the particular lxc.network.script.down you have there would be needed?

      • Have a look at your ovs-vsctl , the ports you add all pile up on that bridge , it would be generally a good practice to clean up those ports from the bridge , esp if you plan to use it for more than one container instance The bridge is persistent across reboots (both lxc and container) every time you boot an lxc a virtual port with the same name of the interface “veth” gets added , these don’t go away on their own (ovs-vsctl usus a database to store all these routing ). Luckily for us these 6 random digits give us a fairly big operational space but still it could cause a random clash the next time you boot and bad luck gives you a formerly used port .(very small , esp if you decide to take down the bridge and recreate it every time ). This becomes even more relevant in case you have dynamic environments and would want to use gre ports with new ip addresses without building up a huge number of grexxx ports.

      • s3hh says:

        Thanks – if that’s true that the virtual port doesn’t go away, then yeah it definately should be cleaned up. I’ll have to test (but can’t right now) – thanks!

  7. Ahmed says:

    Hi,
    I am working in a cluster and I used OVS (1.10.0) to interconnect 10 physical nodes. In each node a did the folowing steps :
    I create an ovs bridge.
    I configure 9 gre tunnel for the rest of node.
    I create my 10 containers and attatch them to the ovs bridge.
    The problem, is I cannot ping VMs and even I perceive that hosts become slow and respond hardly in ping. However, when I teste his with just 2 physical node it works well.
    I think the problem is the number of gre interfaces that causes this troubleshoot.
    Please, is there a solution ?

  8. bmullan says:

    If I understand you right you said you are connecting 10 physical nodes… so you mean 10 physically separate computers. Then on each computer you create 1 container. Then on each computer you also create a GRE tunnel I assume to the OVS bridge. Do each of your containers have “unique” mac and IP addresses?

  9. s3hh says:

    @bmullan,

    no, in the example above (ec2-run-instances -n 2) I’m starting up 2 ‘physical nodes’ (really they are openstack or ec2 instances, so either kvm or xen). Then I have a script which I run to spin up a new container, which the script does round robin. However many physical nodes I start up, exactly one is the ‘master node’ – which means it runs a dnsmasq for the openvswitch network. So each container then gets a unique ip address on the openvswitch network from that single dnsmasq.

  10. s3hh says:

    Note I now prefer to use my juju charm to do this as I should be able to add new nodes easily.

    I intend to update it for saucy and to automatically use nested containers this wednesday (and then use it do a bunch of bug wrangling.)

  11. rob says:

    Why using GRE tunnel ? I’m trying to make 2 containers communicate (each one is on a different ec2 instance) using netcat and iptables packet redirection. However I can’t get this working. Can access my “listening” container from anywhere, but no way to access it from my other container…

    • s3hh says:

      Using GRE tunnel because it’s easy, flexible and works :) Now, St├ęphane
      has pointed out that there is now the gretap which I could use instead of
      openvswitch GRE tunnels. I’ve shown before that openvswitch bridges are
      faster than linux bridges, so I’ll have to do a comparison of ovs GRE vs
      in-kernel implementation.

  12. robertsandersonRob says:

    I’m trying to get LXC working with OVS 1.10.2 which dropped brcompat, any ideas how to work around this?

    ———————————-
    lxc-start 1385579423.632 DEBUG lxc_conf – mac address of host interface ‘veth7N53W1′ changed to private fe:58:02:8d:ca:e2
    lxc-start 1385579423.632 ERROR lxc_conf – failed to attach ‘veth7N53W1′ to the bridge ‘ovsbr0′ : Operation not supported
    lxc-start 1385579423.653 ERROR lxc_conf – failed to create netdev
    lxc-start 1385579423.653 ERROR lxc_start – failed to create the network
    ———————————-

    I’ve seen a few patches which gives OVS native support but I’d rather not have to patch and compile.

    Cheers,
    Rob

    • s3hh says:

      I haven’t worked on this yet. It’s probably best to get an lxc patch (patches welcome :) to DTRT, but you can probably work around it using lxc.network.script.up.

      • robertsandersonRob says:

        Thanks for the reply, lxc.network.script.up does work with older versions of OVS which have brcompat support but this has now been dropped. Will have to look at getting a patch working next week.

        Cheers,
        Rob

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s