User namespaces are a really neat feature, but there are some subtleties involved which can make them perplexing to first play with. Here I’m going to show a few things you can do with them, with an eye to explaining some of the things which might otherwise be confusing.
First, you’ll need a bleeding edge kernel. A 3.9 kernel hand-compiled with user namespace support should be fine (some of the latest missing patches aren’t needed for these games as we won’t be creating full system containers). But for simplicity, you can simply fire up a new raring box and do:
sudo add-apt-repository ppa:ubuntu-lxc/kernel
sudo apt-get update
sudo apt-get dist-upgrade
Now get a few tools from my ppa – you can of course get the source for all from either the ppa, or from my bzr trees.
sudo add-apt-repository ppa:serge-hallyn/user-natty
sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get install nsexec uidmap
Now let’s try a first experiment. Run the following program from nsexec:
This is a simple program which forks a child which runs as root in a new user namespace. Here a brief reminder of how user namespaces are designed is in order. When a new user namespace is created, the task populating it starts as userid -1, nobody. At this point it has full privileges (POSIX capabilities), but those capabilities can only be used toward resources owned by the new namespace. Furthermore, the privileges will be lost as soon as the task runs exec(3) of a normal file. See the capabilities(7) manpage for an explanation.
At this point, userids from the parent namespace may be mapped into the child. For instance, one might map userids 0-9999 in the child to userids 100000-109999 on the host. This is done by writing values to /proc/pid/uid_map (and analogously to /proc/pid/gid_map). The task writing to the map files must have privilege over the parent uids being mapped in.
This is where usernsselfmap comes in. You currently do not have privilege over userids on the host – except your own. usernsselfmap simply maps uid 0 in the container to your own userid on the host. Then it changes to gid and uid 0, and finally executes a shell.
Now look around this shell
ifconfig eth0 down
Note that even though you have CAP_SYS_ADMIN, you cannot change the host’s network settings. However, you can now unshare a new network namespace (still without having privilege on the host) and create network devices in that namespace
nsexec -cmn /bin/bash
ip link add type veth
ifconfig veth0 10.0.0.1 up
Note also that you can’t read under /root. But you can mount a new mounts namespace and mount your $HOME onto /root
# permission denied
nsexec -m /bin/bash
mount –bind $HOME /root
# homedir contents
Now, in addition to the kernel implementation of user namespaces, Eric Biederman has also provided a patchset against shadow to add a concept of subuids and subgids. Briefly, you can modify login.defs to say that every new user should be allocated 10000 (unique) uids and gids above 100000. Then when you add a new user, it will automatically receive a set of 10000 unique subuids. These allocations are stored in /etc/subuid and /etc/subgid, and two new setuid-root binaries, newuidmap and newgidmap (which are shipped in the uidmap binary package, generated from the shadow source package) may be used by an unprivileged user to map userids in a child user namespace to his allocated subuids on the host.
To conclude this post, here is an example of using the new shadow package along with nsexec to manually create a user namespace with more than one userid. First, use usermod to allocate some subuids and subgids for your user (who I’ll assume is user ‘ubuntu’ on an ec2 host) since it likely was created before subuids were configured:
sudo usermod ubuntu -v 110000-120000 -W 110000-120000
Now open two terminals as user ubuntu (or a split byobu screen). In the one, run
nsexec -UW -s 0 -S 0 /bin/bash
about to unshare with 10000000
Press any key to exec (I am 5358)
You’ve asked nsexec to unshare its user namespace (-U), to wait for a keypress before executing /bin/bash (-W), and to switch to userid 0 (-s 0) and groupid 0 (-S 0) before starting that shell. In this example nsexec tells you it is process id 5358, so that you can map userids to it. So from the other shell do:
newuidmap 5358 0 110000 10000
newgidmap 5358 0 110000 10000
Now hit return in the nsexec window, and you will see something like:
Now you can play around as above, but unlike above, you can also switch to userids other than root.
root@server:~# newuidshell 1001
But since we’ve not set up a proper container (or chroot), and since our userid maps to 111001, which is not 1001, we can’t actually write to ubuntu2’s files or read any files which are not world readable.
This then will be the basis of ongoing and upcoming work to facility unprivileged users creating and using containers. Exciting!
(One note: I am here using an old toy ‘nsexec’ for manipulating namespaces. This will eventually be deprecated in favor of the new programs in upstream util-linux. However there has not yet been a release of util-linux with those patches, so they are not yet in the ubuntu package.)
The source tree for the modified shadow package is at lp:~serge-hallyn/ubuntu/raring/shadow/shadow-userns and source for utilities in the nsexec package is at lp:~serge-hallyn/+junk/nsexec.