Full Ubuntu container confined in a user namespace

I’ve mentioned user namespaces here before, and shown how to play a bit with them. When a task is cloned into a new user namespace, the uids in the namespace can be mapped (1-1, in blocks) to uids on the host – for instance uid 0 in the container could be uid 100000 on the host. The uids are translated at the kernel-userspace boundary (i.e. stat, etc), and capabilities for a namespaced task are only valid against objects owned by that namespace. The result is that root in a container is unprivileged on the host.

Eric has been making great progress in moving the kernel functionality upstream. With the newest 3.7 based ubuntu kernel, plus a few of his not yet merged patches, a milestone has been reached – it’s now possible to run a full ubuntu container in a user namespace!

First start up a fresh, uptodate quantal vm or instance. Install my user namespace ppa, install the kernel and nsexec packages from there, create a container, and convert it to be namespaced:

sudo add-apt-repository ppa:serge-hallyn/userns-natty
sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get install linux-image-3.7.0-0-generic nsexec lxc
sudo lxc-create -t ubuntu -n q1
sudo container-userns-convert q1 100000
sudo reboot

The ‘container-userns-convert’ script just shifts the user and group ids of file owners in the container rootfs, and adds two lines to the container configuration file to tell lxc to clone the new user namespace and set up the uid/gid mappings.

Now you can start the container,

sudo lxc-start -n q1 -d
sudo lxc-console -n q1

Look around the container, sudo bash; notice that it looks like a normal system, with ubuntu as uid 1000, root as uid 0. But look from the host, and you see root tasks in the container are actually running as uid 100000, and ubuntu ones as uid 100000.

There are a few oddnesses (you can sudo on ttys 1-4, but sometimes it fails on /dev/console, and shutdown in the container does not kill init); the lxc package needs a few more changes (the cgroup setup needs to be moved to the container parent); and plenty of things are not yet allowed by the kernel (mounting an ext4 filesystem).

But this is a full Ubuntu image, confined by a private user namespace!

After working out some kinks, we’ll next want to look into container startup by unprivileged users.

About these ads
This entry was posted in Uncategorized and tagged , . Bookmark the permalink.

6 Responses to Full Ubuntu container confined in a user namespace

  1. This is awesome^infinity! I hope issues with saucy/XFS-or-whatever-blocking-now are resolved so user namespace are usable in 13.10. It’s been difficult to keep up-to-date with the kernel team, but hopefully no patched kernel will be needed in saucy out-of-the-box. Great work, very exciting!

    • s3hh says:

      Dwight Engen has gotten the xfs patches accepted into the xfs tree. Now we just need the xfs tree to be merged into Linus’ tree. It won’t be enabled in saucy, as that kernel has been chosen, but at the next cycle.

  2. Mahmood says:

    I just tried this with a recompiled kernel 3.11 and it is awesome! However, I have trouble with an non-confined apparmor profile:


    ubuntu@ip-10-148-179-246:~$ sudo lxc-start -n q1
    lxc-start: No such file or directory - failed to change apparmor profile to lxc-container-default
    lxc-start: invalid sequence number 1. expected 4
    lxc-start: failed to spawn 'q1'
    lxc-start: Device or resource busy - failed to remove cgroup '/sys/fs/cgroup/cpuset/lxc/q1-14'
    lxc-start: Device or resource busy - failed to remove cgroup '/sys/fs/cgroup/cpu/lxc/q1-14'
    lxc-start: Device or resource busy - failed to remove cgroup '/sys/fs/cgroup/cpuacct/lxc/q1-14'
    lxc-start: Device or resource busy - failed to remove cgroup '/sys/fs/cgroup/memory/lxc/q1-14'
    lxc-start: Device or resource busy - failed to remove cgroup '/sys/fs/cgroup/devices/lxc/q1-14'
    lxc-start: Device or resource busy - failed to remove cgroup '/sys/fs/cgroup/freezer/lxc/q1-14'

    • s3hh says:

      Which lxc package are you working with? (trusty package? ppa:ubuntu-lxc/daily enabled?) Exactly how did you create the container?

      It’s probably best to open a bug in launchpad for this so we can collect all the data in one place.

      My guess is that somehow the proc fs was not remounted, and the ‘No such file or directory’ is from the attempt to open /proc//attr/current to enact the profile change.

  3. erkules says:

    Why is a reboot needed?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s