Last year around this time, we were announcing the availability of cgmanager, a daemon allowing users and programs to easily administer and delegate cgroups over a dbus interface. It was key to supporting nested containers and unprivileged users.
While its dbus interface turned out to have tremendous benefits (I wasn’t sold at first), there are programs which want to continue using the cgroup file interface. To support use of these in a container with the same delegation benefits of cgmanager, there is now lxcfs.
Lxcfs is a fuse filesystem mainly designed for use by lxc containers. On a Ubuntu 15.04 system, it will be used by default to provide two things: first, a virtualized view of some /proc files; and secondly, filtered access to the host’s cgroup filesystems.
The proc files filtered by lxcfs are cpuinfo, meminfo, stat, and uptime. These are filtered using cgroup information to show only the cpus and memory which are available to the reading task. They can be seen on the host under /var/lib/lxcfs/proc, and containers by default will bind-mount the proc files over the container’s proc files. There have been several attempts to push this virtualization into /proc itself, but those have been rejected. The proposed alternative was to write a library which all userspace would use to get filtered /proc information. Unfortunately no such effort seems to be taking off, and if it took off now it wouldn’t help with legacy containers. In contrast, lxcfs works perfectly with 12.04 and 14.04 containers.
The cgroups are mounted per-host-mounted-hierarchy under /var/lib/lxcfs/cgroup/. When a container is started, each filtered hierarchy will be bind-mounted under /sys/fs/cgroup/* in the container. The container cannot see any information for ancestor cgroups, so for instance /var/lib/lxcfs/cgroup/freezer will contain only a directory called ‘lxc’ or ‘user.slice’.
Lxcfs was instrumental in allowing us to boot systemd containers, both privileged and unprivileged. It also, through its proc filtering, answers a frequent years-old request. We do hope that kernel support for cgroup namespaces will eventually allow us to drop the cgroup part of lxcfs. Since we’ll need to support LTS containers for some time, that will definitely require cgroup namespace support for non-unified hierarchies, but that’s not out of the realm of possibilities.
In summary, on a 15.04 host, you can now create a container the usual way,
lxc-create -t download -n v1 — -d ubuntu -r vivid -a amd64
The resulting container will have “correct” results for uptime, top, etc.
03:09:08 up 0 min, 0 users, load average: 0.02, 0.13, 0.12
It will get cgroup hierarchies under /sys/fs/cgroup:
root@v1:~# find /sys/fs/cgroup/freezer/
And, it can run systemd as init.