Full Ubuntu container confined in a user namespace

I’ve mentioned user namespaces here before, and shown how to play a bit with them. When a task is cloned into a new user namespace, the uids in the namespace can be mapped (1-1, in blocks) to uids on the host – for instance uid 0 in the container could be uid 100000 on the host. The uids are translated at the kernel-userspace boundary (i.e. stat, etc), and capabilities for a namespaced task are only valid against objects owned by that namespace. The result is that root in a container is unprivileged on the host.

Eric has been making great progress in moving the kernel functionality upstream. With the newest 3.7 based ubuntu kernel, plus a few of his not yet merged patches, a milestone has been reached – it’s now possible to run a full ubuntu container in a user namespace!

First start up a fresh, uptodate quantal vm or instance. Install my user namespace ppa, install the kernel and nsexec packages from there, create a container, and convert it to be namespaced:

sudo add-apt-repository ppa:serge-hallyn/userns-natty
sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get install linux-image-3.7.0-0-generic nsexec lxc
sudo lxc-create -t ubuntu -n q1
sudo container-userns-convert q1 100000
sudo reboot

The ‘container-userns-convert’ script just shifts the user and group ids of file owners in the container rootfs, and adds two lines to the container configuration file to tell lxc to clone the new user namespace and set up the uid/gid mappings.

Now you can start the container,

sudo lxc-start -n q1 -d
sudo lxc-console -n q1

Look around the container, sudo bash; notice that it looks like a normal system, with ubuntu as uid 1000, root as uid 0. But look from the host, and you see root tasks in the container are actually running as uid 100000, and ubuntu ones as uid 100000.

There are a few oddnesses (you can sudo on ttys 1-4, but sometimes it fails on /dev/console, and shutdown in the container does not kill init); the lxc package needs a few more changes (the cgroup setup needs to be moved to the container parent); and plenty of things are not yet allowed by the kernel (mounting an ext4 filesystem).

But this is a full Ubuntu image, confined by a private user namespace!

After working out some kinks, we’ll next want to look into container startup by unprivileged users.

About these ads
This entry was posted in Uncategorized and tagged , . Bookmark the permalink.

9 Responses to Full Ubuntu container confined in a user namespace

  1. This is awesome^infinity! I hope issues with saucy/XFS-or-whatever-blocking-now are resolved so user namespace are usable in 13.10. It’s been difficult to keep up-to-date with the kernel team, but hopefully no patched kernel will be needed in saucy out-of-the-box. Great work, very exciting!

    • s3hh says:

      Dwight Engen has gotten the xfs patches accepted into the xfs tree. Now we just need the xfs tree to be merged into Linus’ tree. It won’t be enabled in saucy, as that kernel has been chosen, but at the next cycle.

  2. Mahmood says:

    I just tried this with a recompiled kernel 3.11 and it is awesome! However, I have trouble with an non-confined apparmor profile:


    ubuntu@ip-10-148-179-246:~$ sudo lxc-start -n q1
    lxc-start: No such file or directory - failed to change apparmor profile to lxc-container-default
    lxc-start: invalid sequence number 1. expected 4
    lxc-start: failed to spawn 'q1'
    lxc-start: Device or resource busy - failed to remove cgroup '/sys/fs/cgroup/cpuset/lxc/q1-14'
    lxc-start: Device or resource busy - failed to remove cgroup '/sys/fs/cgroup/cpu/lxc/q1-14'
    lxc-start: Device or resource busy - failed to remove cgroup '/sys/fs/cgroup/cpuacct/lxc/q1-14'
    lxc-start: Device or resource busy - failed to remove cgroup '/sys/fs/cgroup/memory/lxc/q1-14'
    lxc-start: Device or resource busy - failed to remove cgroup '/sys/fs/cgroup/devices/lxc/q1-14'
    lxc-start: Device or resource busy - failed to remove cgroup '/sys/fs/cgroup/freezer/lxc/q1-14'

    • s3hh says:

      Which lxc package are you working with? (trusty package? ppa:ubuntu-lxc/daily enabled?) Exactly how did you create the container?

      It’s probably best to open a bug in launchpad for this so we can collect all the data in one place.

      My guess is that somehow the proc fs was not remounted, and the ‘No such file or directory’ is from the attempt to open /proc//attr/current to enact the profile change.

  3. erkules says:

    Why is a reboot needed?

    • s3hh says:

      Because you’ve installed a new kernel.

      • lolcat says:

        \o/ at reboot Q&A :)
        anyway…So I get word thatlxc now sounds like it’s tremendously close to production quality… Great work. How Is the network SETUID in terms of security contra the usual (host ) network stack? If equally strong/weak to attacks, then it [lxc] is ready for production servers?

      • s3hh says:

        Hi, thanks. I’m not quite sure whether you mean something a bit different by ‘contra the network stack’, but I would say,

        Containers at this point, using apparmor/selinux, seccomp, cgroups, and user namespaces, are as secure as containers will get. The remaining attack surface – any unknown-to-us attacks against syscalls – can’t be further mitigated so long as we share a kernel. Note seccomp is not configured by default, but if you know your workload I’d heavily recommend using a seccomp blacklist (v2 policy) to further reduce the attack surface.

        To give a strong answer: providing root in an unprivileged container is no more dangerous than providing a regular user shell account (by definition, since any unprivileged user can create a new user namespace in which he is root). And, if you are sufficiently paranoid, any network facing service has a chance of an exploit allowing escape to at least an unprivileged user shell, which again is equivalent to an unprivileged container.

        If I were going to wait for anything, I’d simply give it some time for any more of the (implementation and design) bugs to be shaken out. We’ve already run into some, and the interactions are complex and subtle enough that we shouldn’t be too surprised to learn about more.

  4. lolcat says:

    Ye.. it’s brilliant serge. I dropped by last year some time, commending how much legwork you guys have put in and said I was looking forward to what is obviously today. And in all honesty, I was actually thinking it would be closer to end of 2014 so I am very happy and impressed with how much and far you guys have come on this. I am now dwelving into the LXC world, preparing to go ‘live’ as you say, with perhaps some more bugs been hammered out and so forth. Thank you for the tips. Policies will be some work. Also need to test more on container-systemd potential oddities/bugs. (Am principally a Arch user since unity made me say bye bye to Ubu :))

    I did grab a dev release of ubutnu now though to make sure all the patches and so on were in place. I re-compiled the arch kernel (whcih didn’t have user namespaces set) but think some of the patches to login and/or shadow might be lacking. WIll find out soon.

    Still, a huge applaud for the work you guys have been throwing at this! :)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s