announcing lxc-snapshot

In April, lxc-clone gained the ability to create overlayfs snapshot clones of directory backed containers. In may, I wrote a little lxc-snap program based on that which introduced simple ‘snapshots’ to enable simple incremental development of container images. But a standalone program is not only more tedious to discover and install, it will also tend to break when the lxc API changes.

Now (well, recently) the ability to make snapshots has been moved into the lxc API itself, and the program lxc-snapshot, based on that, is shipped with lxc. (Leaving lxc-snap happily deprecated.)

As an example, let’s say you have a container c1, and you want to test a change in its /lib/init/fstab. You can snapshot it,

sudo lxc-snapshot -n c1

test your change, and, if you don’t like the result, you can recreate the original container using

sudo lxc-snapshot -n c1 -r snap0

The snapshot is stored as a full snapshot-cloned container, and restoring is done as a copy-clone using the original container’s backing store type. If your original container was /var/lib/lxc/c1, then the first snapshot will be /var/lib/lxcsnaps/c1/snap0, the next will be /var/lib/lxcsnaps/c1/snap1, etc.

There are some complications. Restoring a container to its original name as done in the above example will work if you have a btrfs backed container. But if your original container was directory backed, then the snapshot will be overlayfs-based, and will depend on the original container’s rootfs existing. Therefore it will pin the original container, and you’ll need to restore the snapshot to a new name, i.e.

sudo lxc-snapshot -n c1 -r snap0 c2

If you want to see a list of snapshots for container c1, do

sudo lxc-snapshot -n c1 -L

If you want to store a comment with the snapshot, you can

echo "This is before my /lib/init/fstab change" >> comment1
sudo lxc-snapshot -n c1 -c comment1

And then when you do

sudo lxc-snapshot -n c1 -L -C

you’ll see the snapshot comments after each snapshot.

There is certainly room for lots of feature development in lxc-snapshot. It could add removal support, improve comment snapshot support, sort snapshots in the listing, and for that matter could work around the overlayfs shortcomings to allow restoring a container to its original name. So if someone is looking for something to do, here’s one of many things waiting for an owner :) Meanwhile it seems to me plenty useful as is.

Have fun!

Posted in Uncategorized | Tagged , | 3 Comments

libvirt defaults (and openvswitch bridge performance)

The libvirt-bin package in Ubuntu installs a default NATed virtual network,
virbr0. This isn’t always the best choice for everyone, however it “just
works” everywhere. It also provides some simple protection – the VMs aren’t
exposed on the network for all attackers to see.

Two alternatives are sometimes suggested. One is to simply default to a
non-NATed bridge. The biggest reason we can’t do this is that it would break
users with wireless cards. Another issue is that instead of simply tacking
something new onto the network, we have to usurp the default network interface
into our new bridge. It’s impossible to guess all the ways users might have
already customized their network.

The other alternative is to use an openvswitch bridge. This actually has the
same problems as the linux bridge – you still can’t add a VM nic to an
openvswitch-bridged wireless NIC, and we still would be modifying the default
network.

However the suggestion did make me wonder – how would ovs bridges compare to
linux ones in terms of performance? I’d have expected them to be slower (as a
tradeoff for much greater flexibility), but I was surprised when I was told
that ovs bridges are expected to perform better. So I set up a quick test, and
sure enough!

I set up two laptops running saucy, connected over a physical link. On the one
I installed a saucy VM. Then I ran iperf over the physical link from the other
laptop to the VM. When the VM was attached using a linux bridge, I got:

830 M/s
757 M/s
755 M/s
827 M/s
821 M/s

When I instead used an openvswitch bridge, I got:

925 M/s
925 M/s
925 M/s
916 M/s
924 M/s

So, if we’re going to go with a new default, using openvswitch seems like a
good way to go. I’m still loath to make changes to the default, however a
script (sitting next to libvirt-migrate-qemu-disks) which users can optionally
run to do the gruntwork for them might be workable.

Posted in Uncategorized | Tagged , , , | 2 Comments

Line buffering (and talking computers and irc meetings)

When I was a kid, I wanted computers to talk to me (and vice versa). Why, decades later, am I still squinting at the screen during hour-long irc meetings? Basta! So the last two weeks I’ve been listening to our hour-long team irc meeting.

There is an excellent screen reader integrated with unity (well, it needs some tweaking to speed up the voice, which you can do in /etc/speech-dispatcher/speechd.conf), but it’s designed for a different purpose: every time you foreground a window it will switch to reading from that window. I want something simpler – I want a stream of text to be converted to speech as it comes in, while I play on other windows and desktops without the speech being affected. So the way I’ve decided to do it is as follows. First I connect using ‘sic’ (the ‘simple irc client’ from suckless.org) to the irc server. Sic uses simple text files for input and output, with all output (for all channels) going to the same file. So I start it as:

   rm -f sic.in; touch sic.in; tail -f sic.in | sic -h irc.freenode.net -n lurker | tee -a sic.out

and in one window I have a loop into which I can type:

    while read l; do [ -z "$l" ] && break; echo "$l" >> sic.in; done

Then I use the following to make it speak:

    tail -n 2 -f ~/sic/sic.out | egrep --line-buffered -v 'ChanServ|TOPIC|JOIN|PART|QUIT|ubottu' | awk -F\  '{ $1=""; $2=""; $3=""; print; fflush(); }' | espeak --stdout -s 220 -ven-us | aplay 2>/dev/null

All this does is filter out some of the stuff I don’t want to hear. (Yes the grep is “superfluous”, deal with it :) But the part which I found interesting enough to blog about was the extra steps in egrep and awk. By default, they do not use line buffers on output, because that could harm performance in normal use. But for what I’m doing, I need every line to be buffered to trigger the next command in the pipeline.

In egrep, this is done simply using –line-buffered. In awk, I do it by adding a fflush() after the print command. I actually had to spend some time looking into this, so I thought I’d share my results.

As a simple example, try the following:

	tail -f /etc/hosts | while read line; do echo $line; done

Does what you’d expect. Now insert awk into the pipeline to print only the hostname:

	tail -f /etc/hosts | awk '{ print $2 }' | while read line; do echo $line; done

There’s no output. That’s because awk is buffering and hasn’t yet flushed its stdout. So the while loop has no output. You can fix this by forcing it to fflush(), as in:

	tail -f /etc/hosts | awk '{ print $2; fflush(); }' | while read line; do echo $line; done

Now there is output.

So there you go. If you want to use something like irssi (which I also have a script for, to listen to hilights window output), you just need to do some tweaking to the grep and awk commands which drop the cruft.

Have fun!

Posted in Uncategorized | 2 Comments

Creating and using containers – without privilege

Today I posted a (working but mainly POC) patchset against lxc which allows me to create and start ubuntu-cloud containers – completely as an unprivileged user. For more details see the introductory email to the patchset at http://sourceforge.net/mailarchive/forum.php?thread_name=1374246151-7069-9-git-send-email-serge.hallyn%40ubuntu.com&forum_name=lxc-devel

Glossing over prerequisites (which you can see in the email), the actual commands I used were:

lxc-create -t /home/serge/lxc-ubuntu-cloud -P /home/serge/lxcbase -n x3 -f default.conf -- -T precise.tar.gz
lxc-start -P /home/serge/lxcbase -n x3

There’s more work to be done:

  • unprivileged containers cannot (yet) be networked
  • something needs to set up per-user cgroups at boot or login
  • something needs to create a per-user lockdir under /run
  • user namespaces need to be enabled in the default kernels
  • template handling of caching and locking needs to be made saner with respect to configurable lxcpaths

These are pretty minor, though, compared to what we’ve already achieved:

  • User namespace support (minus XFS support) is in the kernel – thanks to a heroic effort by Eric Biederman (and sponsored by Canonical).
  • The work needed enable subuids – also written by Eric – was accepted into our shadow package in saucy.
  • The basic patchset to enable use of user namespaces by privileged users has been in lxc for some time now.

Background on user namespaces:

When you create a new user namespace, it initially is unmapped. Your task has uid and gid -1. You can then map userids from the parent namespace onto userids in the new namespace by ranges. For instance if you are userid 1000, then you can map uid 1000 in the parent to uid 0 in the namespace. From the kernel’s point of view, you can only map uids which you have privilege over – either by being that uid, or having CAP_SYS_ADMIN in the parent.

This is where subuids come in. /etc/subuids and /etc/subgids list range of uids which users are allowed to map. The newuidmap and newgidmap are setuid-root programs which will respect those subuids to allow unprivileged users to map their allotted subuids.

Lxc uses these programs (indirectly through the ‘usernsexec’ program) to allow unprivileged users to map their allotted subuids to containers.

Of note is that regular DAC and MAC remain unchanged. Therefore although I as user serge/uid 1000 may have 100000-199999 as my subuids, I do not own files owned by those subuids! To work around this, map the uids together into a namespace. For instance, if you are uid 1000 and want to create a file owned by uid 100000, you can

touch foo
usernsexec -m b:0:100000:1 -m b:1000:1000 -- /bin/chown 0 foo

This maps 100000 on the host to root in the new namespace, and 1000 on the host to 1000 in the namespace. So host uid 100000 actually has privilege over the namespace, including over host uid 1000. It is therefore allowed to chown a file owned by uid 1000 to uid 0 (which is host uid 100000). You end up with foo owned by uid 100000. You can do the same sort of games to clean up containers (and lxc-destroy will do so).

Posted in Uncategorized | Tagged , | 3 Comments

RSS

When I saw the news that Google reader was going away, my first thought, like a lot of other people, was “woohoo, it’s going to be fun writing a new way to follow RSS.” This past weekend I looked at the calendar and, like most people, shouted “oh no, July 1 is just about here and I’ve done nothing.” I originally figured I would write a script to push links to pocket, which formats very nicely on the nook and in browsers. But then I ran into rss2email. I may hack it to send just one pocket-able url (though writing from scratch may be more fun) but for now, unchanged and sending full posts in email which I can read anywhere, it’s keeping me quite happy.

Posted in Uncategorized | 2 Comments

2013 Linux Security Summit CFP closing soon

Just a short reminder that if you were interested in submitting a talk for the linux security summit, the call for participation (at http://kernsec.org/wiki/index.php/Linux_Security_Summit_2013) will be closing tomorrow, Friday Jun 14.

The summit will be held September 19-20 in New Orleans, co-located with LinuxCon. Hope to see you there!

Posted in Uncategorized | Leave a comment

Introducing lxc-snap

lxc-snap: lxc container snapshot management tool

BACKGROUND

Lxc supports containers backed by overlayfs snapshots. The way this is
typically done is to create a container backed by a regular directory,
then create a new container which mounts the first container’s rootfs
as a read-only lower mount, with a new private delta directory as
its read-write upper mount. For instance, you could

sudo lxc-create -t ubuntu -n r0 # create a normal directory
sudo lxc-clone -B overlayfs -s r0 o1 # create overlayfs clone

The second container, o1, when started up will mount /var/lib/lxc/o1/delta0
as a writeably overlay on top of /var/lib/lxc/r0/rootfs, and use that as its
root filesystem.

From here you can clone o1 to a new container o2. This simply copies the
the overlayfs delta from o1 to o2, and you is done with

sudo lxc-clone -s o1 o2

LXC-SNAP

One of the obvious use cases of these snapshot clones is to support
incremental development of rootfs images. Make some changes, snapshot,
make some more changes, snapshot, revert…

lxc-snap is a small program using the lxc API to more easily support
this use case. You begin with a overlayfs backed container, make some
changes, snapshot, make some changes, snapshot… This is a simpler
model than manually using clone because you continue developing the same
container, o1, while the snapshots are kept away until you need them.

EXAMPLE

Create your first container

sudo lxc-create -t ubuntu -n base
sudo lxc-clone -s -B overlayfs base mysql

Now make initial customizations, and snapshot:

sudo lxc-snap mysql

This will create a snapshot container /var/lib/lxcsnaps/mysql_0. You can actually
start it up if you like using ‘sudo lxc-start -P /var/lib/lxcsnaps -n mysql_0′.
(However, that is not recommended, as it will cause changes in the rootfs)

Next, make some more changes. Write a comment about the changes you made in this
version,

echo “Initial definition of table doomahicky” > /tmp/comment

sudo lxc-snap -c /tmp/comment mysql

Do this a few times. Now you realize you lost something you needed. You can
see the list of containers which have snapshots using

lxc-snap -l

and the list of versions of container mysql using

lxc-snap -l mysql

Note that it shows you the time when the snapshot was created, and any comments
you logged with the snapshot. You see that what you wanted was version 2, so
recover that snapshot. You can destroy container mysql and restore version 2
to it, or (I would recommend) use a different name to restore the snapshot to.

Use a different name with:

sudo lxc-snap -r mysql_2 mysql_tmp

or destroy mysql and restore the snapshot to it using

sudo lxc-destroy -n mysql
sudo lxc-snap -r mysql_2 mysql

When you’d like to export a container, you can clone it back to a directory
backed container and tar it up:

sudo lxc-clone -B dir mysql mysql_ship
sudo tar zcf /srv/mysql_ship.tar.gz /var/lib/lxc/mysql_ship

BUILD AND INSTALL

To use lxc-snap, you currently need to be using lxc from the ubuntu-lxc
daily ppa. On an ubuntu system (at least 12.04) you can

sudo add-apt-repository ppa:ubuntu-lxc/daily
sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get install lxc

lxc-snap will either become a part of the lxc package, or will become a
separate package. Currently it is available at
git://github.com/hallyn/lxc-snap. Fetch it using:

git clone git://github.com/hallyn/lxc-snap

Then build lxc-snap by typing ‘make’.

cd lxc-snap
make

Install into /usr/bin by typing

sudo DESTDIR=/usr make install

or install into /home/$USER/bin by typing

mkdir /home/$USER/bin
DESTDIR=/home/$USER make install

Note that lxc-snap is in very early development. It’s usage may
change over time, and as it currently ships a copy of liblxc .h
files it needs, it may occasionally break and need to be updated
from git and rebuilt. Using a package (as soon as it becomes
available) is recommended.

Note that lxc-snap is in very early development. It’s usage may
change over time, and as it currently ships a copy of liblxc .h
files it needs, it may occasionally break and need to be updated
from git and rebuilt. Using a package (as soon as it becomes
available) is recommended.
lxc package, or will become a
separate package. Currently it is available at
git://github.com/hallyn/lxc-snap. Fetch it using:

git clone git://github.com/hallyn/lxc-snap

Then build lxc-snap by typing ‘make’.

cd lxc-snap
make

Install into /usr/bin by typing

sudo DESTDIR=/usr make install

or install into /home/$USER/bin by typing

mkdir /home/$USER/bin
DESTDIR=/home/$USER make install

Note that lxc-snap is in very early development. It’s usage may
change over time, and as it currently ships a copy of liblxc .h
files it needs, it may occasionally break and need to be updated
from git and rebuilt. Using a package (as soon as it becomes
available) is recommended.

Posted in Uncategorized | Tagged , | 8 Comments