Introducing lxcfs

Last year around this time, we were announcing the availability of cgmanager, a daemon allowing users and programs to easily administer and delegate cgroups over a dbus interface. It was key to supporting nested containers and unprivileged users.

While its dbus interface turned out to have tremendous benefits (I wasn’t sold at first), there are programs which want to continue using the cgroup file interface. To support use of these in a container with the same delegation benefits of cgmanager, there is now lxcfs.

Lxcfs is a fuse filesystem mainly designed for use by lxc containers. On a Ubuntu 15.04 system, it will be used by default to provide two things: first, a virtualized view of some /proc files; and secondly, filtered access to the host’s cgroup filesystems.

The proc files filtered by lxcfs are cpuinfo, meminfo, stat, and uptime. These are filtered using cgroup information to show only the cpus and memory which are available to the reading task. They can be seen on the host under /var/lib/lxcfs/proc, and containers by default will bind-mount the proc files over the container’s proc files. There have been several attempts to push this virtualization into /proc itself, but those have been rejected. The proposed alternative was to write a library which all userspace would use to get filtered /proc information. Unfortunately no such effort seems to be taking off, and if it took off now it wouldn’t help with legacy containers. In contrast, lxcfs works perfectly with 12.04 and 14.04 containers.

The cgroups are mounted per-host-mounted-hierarchy under /var/lib/lxcfs/cgroup/. When a container is started, each filtered hierarchy will be bind-mounted under /sys/fs/cgroup/* in the container. The container cannot see any information for ancestor cgroups, so for instance /var/lib/lxcfs/cgroup/freezer will contain only a directory called ‘lxc’ or ‘user.slice’.

Lxcfs was instrumental in allowing us to boot systemd containers, both privileged and unprivileged. It also, through its proc filtering, answers a frequent years-old request. We do hope that kernel support for cgroup namespaces will eventually allow us to drop the cgroup part of lxcfs. Since we’ll need to support LTS containers for some time, that will definitely require cgroup namespace support for non-unified hierarchies, but that’s not out of the realm of possibilities.

Lxcfs is packaged in ubuntu 15.04, the source is hosted at github.com/lxc/lxcfs, and news can be tracked at linuxcontainers.org/lxcfs.

In summary, on a 15.04 host, you can now create a container the usual way,

lxc-create -t download -n v1 — -d ubuntu -r vivid -a amd64

The resulting container will have “correct” results for uptime, top, etc.

root@v1:~# uptime
03:09:08 up 0 min, 0 users, load average: 0.02, 0.13, 0.12

It will get cgroup hierarchies under /sys/fs/cgroup:

root@v1:~# find /sys/fs/cgroup/freezer/
/sys/fs/cgroup/freezer/
/sys/fs/cgroup/freezer/user.slice
/sys/fs/cgroup/freezer/user.slice/user-1000.slice
/sys/fs/cgroup/freezer/user.slice/user-1000.slice/session-1.scope
/sys/fs/cgroup/freezer/user.slice/user-1000.slice/session-1.scope/v1
/sys/fs/cgroup/freezer/user.slice/user-1000.slice/session-1.scope/v1/tasks
/sys/fs/cgroup/freezer/user.slice/user-1000.slice/session-1.scope/v1/cgroup.procs
/sys/fs/cgroup/freezer/user.slice/user-1000.slice/session-1.scope/v1/freezer.state
/sys/fs/cgroup/freezer/user.slice/user-1000.slice/session-1.scope/v1/cgroup.clone_children
/sys/fs/cgroup/freezer/user.slice/user-1000.slice/session-1.scope/v1/freezer.parent_freezing
/sys/fs/cgroup/freezer/user.slice/user-1000.slice/session-1.scope/v1/notify_on_release
/sys/fs/cgroup/freezer/user.slice/user-1000.slice/session-1.scope/v1/freezer.self_freezing

And, it can run systemd as init.

This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

7 Responses to Introducing lxcfs

  1. Ikem Krueger says:

    First we had procfs to manage cgroups. Now we add on top lxcfs. Doesn’t sound right to me.

    • s3hh says:

      If/when a sufficient implementation of “cgroup namespaces” makes it into the kernel, we’ll switch to that. This is a means to an end.

      But the /proc/{meminfo,etc} parts are likely to be used for quite some time – until either kernel virtualizes /proc (not likely) or a majority of userspace software starts using a libvirtproc type library.

  2. Matt Helsley says:

    I’m curious: What sold you on the DBus interface for cgmanager?

    • s3hh says:

      The ease with which projects written in various languages as well as scripts could talk to it of it. If I’d gone with a simple protocol over a Unix socket, that integration would have required a lot more custom work for each project.

  3. opotonil says:

    I am trying cgmanager and lxcfs on archlinuxarm using this systemd units:

    /usr/lib/systemd/system/cgmanager.service

    [Unit]
    Description=Cgroup management daemon
    ConditionVirtualization=!container
    Before=cgproxy.service
    After=local-fs.target

    [Service]
    Type=simple
    ExecStart=/usr/bin/cgmanager -m name=systemd
    KillMode=process
    Restart=on-failure

    [Install]
    WantedBy=multi-user.target

    /usr/lib/systemd/system/lxcfs.service
    [Unit]
    Description=FUSE filesystem for LXC
    ConditionVirtualization=!container
    Before=lxc.service
    After=cgmanager.service
    Requires=cgmanager.service

    [Service]
    ExecStart=/usr/bin/lxcfs -f -s -o allow_other /var/lib/lxcfs
    KillMode=process
    Restart=on-failure
    ExecStopPost=-/bin/fusermount -u /var/lib/lxcfs

    [Install]
    WantedBy=multi-user.target

    But when I start lxcfs.service, cgmaner report next error:
    Could not get password database information for UID of current process: User “???” unknown or no memory to allocate password entry

    What I doing bad? Thanks.

Leave a comment