Containers are most commonly distributed in two ways:
1. ‘Image based’: lxc and lxd distribute their container images as full images, a simple representation of root filesystem and some configuration info.
2. OCI: based on the original docker format, this has become an open standard for publishing not only container images, but any artifacts.
Our products are created, distributed, and installed as OCI. All services run as containers. Each container rootfs is re-created from its OCI image at every start. A physical machine’s rootfs is also shipped as an OCI image, and is recreated on every boot. A system representation therefore consists of a manifest specifying the OCI references for services to run. To make this secure,
1. Images must be verifiable. An fs extraction step, such as un-tarring, prevents us from verifying the result on next boot without re-extracting. Therefore we distribute OCI layers as squashfs instead of tarballs, and mount them using overlayfs.
2. Squashfs layers ship with their dmverity root hash in the image manifest.
3. The system manifest which lists the content-addressed OCI images is signed with a product key.
4. The certificate for a product’s manifest signing public key is stored with the system manifest. All product manifest signing certificates are signed by one manifest signing CA.
5. The manifest signing CA certificate is stored in initrd.
6. The initrd, ‘smooshed’ together with the kernel and kernel command-line into one kernel.efi, are signed with a kernel signing key.
7. The TPM keys for root filesystems and machine-identifying unique key are only unlocked for the pcr7 resulting from (our shim and) a kernel signed with the right kernel signing key certificate.
In this way, we can ship a single ‘kernel.efi’ for all TPM-enabled hardware and VM products. To protect different groups’ products from each other, products are provisioned with a product ID, which must match product ID in the product manifest signing certificate. Each machine is also provisioned with a unique keypair, supporting secure cluster bringup and remote attestation.
This allows us to use OCI as the source for (verifiably) securely installed and booted products. We can install the OS on a host in the traditional way, or we can pxe-boot specifying on the kernel command-line an OCI URL to a layer containing the manifest to boot into.
We hope to present the full solution (with source) at FOSDEM 2023.
1. For more details on the OCI specification, see https://github.com/opencontainers/image-spec/blob/main/spec.md.
2. The very code for generating and mounting squashfs based OCI images is at https://github.com/project-stacker/stacker and https://github.com/project-stacker/stacker/tree/master/atomfs.
3. The in-development replacement for atomfs is puzzlefs, at https://github.com/anuvu/puzzlefs and https://github.com/anuvu/puzzlefs/blob/master/doc/index.md.
4. The TPM-based unattended encrypted filesystem solution was presented in full at LSS 2021: ‘Securing TPM secrets in the datacenter’: https://www.youtube.com/watch?v=wfJDmfPP1OA.