GithubHelp home page GithubHelp logo

Rootless on Bottlerocker failed with failed to mount /run/user/1000/containerd-mount3852074643: operation not permitted about buildkit HOT 18 CLOSED

AhmadMS1988 avatar AhmadMS1988 commented on June 8, 2024
Rootless on Bottlerocker failed with failed to mount /run/user/1000/containerd-mount3852074643: operation not permitted

from buildkit.

Comments (18)

AkihiroSuda avatar AkihiroSuda commented on June 8, 2024 1

One question comes to my mind, as we by default use the oci worker, what is this containerd mount?

OCI mode still consumes containerd as a library

from buildkit.

vtgspk avatar vtgspk commented on June 8, 2024 1

As mentioned by @bcressey , Bottlerocket mounts its local storage with “nosuid” and “nodev” flags as a hardening step, and those flags are among those that have to be passed in subsequent bind mounts.

Here is the workaround using a persistent volume(EBS csi driver in EKS) instead of emptyDir that in turn uses Bottlerocket's local storage

Pod: Used fsGroup as 1000 to mount the volume within the pod for user (1000) and the Group (1000) to have access

Pod yaml - https://github.com/vtgspk/buildkit-rootless/blob/main/pod.yml
Persistent Volume Claim- https://github.com/vtgspk/buildkit-rootless/blob/main/persistent-claim.yml
Storage class - https://github.com/vtgspk/buildkit-rootless/blob/main/storage-class.yml

By this way, I am able to get the buildkitd pod up and running and build images successfully within that which uses the EBS mount instead of the Bottlerocket local storage.

from buildkit.

AkihiroSuda avatar AkihiroSuda commented on June 8, 2024

Does it work if you specify securityContext.privileged ?

from buildkit.

AkihiroSuda avatar AkihiroSuda commented on June 8, 2024

Does https://raw.githubusercontent.com/moby/buildkit/master/examples/kubernetes/job.rootless.yaml work?

from buildkit.

AhmadMS1988 avatar AhmadMS1988 commented on June 8, 2024

We do not want to run it in privileged mode.

from buildkit.

AkihiroSuda avatar AkihiroSuda commented on June 8, 2024

We do not want to run it in privileged mode.

Asking for a diagnosis purpose

from buildkit.

AhmadMS1988 avatar AhmadMS1988 commented on June 8, 2024

It worked actually, but still the purpose to run it without privileged.

from buildkit.

AhmadMS1988 avatar AhmadMS1988 commented on June 8, 2024

One question comes to my mind, as we by default use the oci worker, what is this containerd mount?

from buildkit.

AhmadMS1988 avatar AhmadMS1988 commented on June 8, 2024

Is there any logs or commands that I can execute to help investigating more?

from buildkit.

AkihiroSuda avatar AkihiroSuda commented on June 8, 2024

Is there any logs or commands that I can execute to help investigating more?

cat /proc/mounts in the buildkitd container, and compare the result with Ubuntu nodes, etc.

from buildkit.

AhmadMS1988 avatar AhmadMS1988 commented on June 8, 2024

It worked as expected on both Amazon linux 2 and Ubuntu EKS optimized images based on 20.04.
The output of /proc/mounts is:

overlay / overlay rw,context="system_u:object_r:data_t:s0:c208,c287",relatime,lowerdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/71/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/59/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/55/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/50/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/45/fs:/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/40/fs,upperdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/425/fs,workdir=/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs/snapshots/425/work 0 0
proc /proc proc rw,nosuid,nodev,noexec,relatime 0 0
tmpfs /dev tmpfs rw,context="system_u:object_r:data_t:s0:c208,c287",nosuid,size=65536k,mode=755 0 0
devpts /dev/pts devpts rw,context="system_u:object_r:data_t:s0:c208,c287",nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666 0 0
mqueue /dev/mqueue mqueue rw,seclabel,nosuid,nodev,noexec,relatime 0 0
sysfs /sys sysfs ro,seclabel,nosuid,nodev,noexec,relatime 0 0
cgroup /sys/fs/cgroup cgroup2 ro,seclabel,nosuid,nodev,noexec,relatime 0 0
/dev/nvme1n1p1 /etc/hosts xfs rw,seclabel,nosuid,nodev,noatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
/dev/nvme1n1p1 /dev/termination-log xfs rw,seclabel,nosuid,nodev,noatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
/dev/nvme1n1p1 /etc/hostname xfs rw,seclabel,nosuid,nodev,noatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
/dev/nvme1n1p1 /etc/resolv.conf xfs rw,seclabel,nosuid,nodev,noatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
shm /dev/shm tmpfs rw,seclabel,nosuid,nodev,noexec,relatime,size=65536k 0 0
/dev/nvme1n1p1 /home/user/.local/share/buildkit xfs rw,seclabel,nosuid,nodev,noatime,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
tmpfs /run/secrets/kubernetes.io/serviceaccount tmpfs ro,seclabel,relatime,size=6931992k 0 0
proc /proc/bus proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/fs proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/irq proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/sys proc ro,nosuid,nodev,noexec,relatime 0 0
proc /proc/sysrq-trigger proc ro,nosuid,nodev,noexec,relatime 0 0
tmpfs /proc/acpi tmpfs ro,context="system_u:object_r:data_t:s0:c208,c287",relatime 0 0
tmpfs /proc/kcore tmpfs rw,context="system_u:object_r:data_t:s0:c208,c287",nosuid,size=65536k,mode=755 0 0
tmpfs /proc/keys tmpfs rw,context="system_u:object_r:data_t:s0:c208,c287",nosuid,size=65536k,mode=755 0 0
tmpfs /proc/latency_stats tmpfs rw,context="system_u:object_r:data_t:s0:c208,c287",nosuid,size=65536k,mode=755 0 0
tmpfs /proc/timer_list tmpfs rw,context="system_u:object_r:data_t:s0:c208,c287",nosuid,size=65536k,mode=755 0 0
tmpfs /proc/scsi tmpfs ro,context="system_u:object_r:data_t:s0:c208,c287",relatime 0 0
tmpfs /sys/firmware tmpfs ro,context="system_u:object_r:data_t:s0:c208,c287",relatime 0 0

Can you please take a look and provide feedback so I can open a ticket to Bottlerocket team with the details?
Thanks

from buildkit.

AkihiroSuda avatar AkihiroSuda commented on June 8, 2024

Seems relevant to SELinux? Does this work?

securityContext:
  seLinuxOptions:
    level: s0
    type: spc_t

from buildkit.

AhmadMS1988 avatar AhmadMS1988 commented on June 8, 2024

Unfortunately, it did not work.
I got the same error.

from buildkit.

bcressey avatar bcressey commented on June 8, 2024

As far as I can tell this is the same error that was fixed in #3697, but at a different stage in the process.

Running mountsnoop from bcc, I can see that the initial set of bind mounts go OK:

buildkitd        210370  210738  4026533418  mount("/home/user/.local/share/buildkit/runc-overlayfs/snapshots/snapshots/2/fs", "/home/user/.local/tmp/buildkit-mount276192057", "bind", MS_RDONLY|MS_NOSUID|MS_NODEV|MS_NOATIME|MS_BIND|MS_REC, "") = 0
buildkitd        210370  210738  4026533418  mount("", "/home/user/.local/tmp/buildkit-mount276192057", "", MS_RDONLY|MS_NOSUID|MS_NODEV|MS_REMOUNT|MS_NOATIME|MS_BIND|MS_REC, "") = 0
...

However, the operation ultimately fails in the call to overlay.WriteUpperdir:

2024-05-06T01:07:57.845201317Z stderr F time="2024-05-06T01:07:57Z" level=warning msg="failed to compute blob by overlay differ (ok=false): failed to write compressed diff: failed to mount /home/user/.local/tmp/containerd-mount1074778686: operation not permitted" span="export layers" spanID=0f5a00d506b35262 traceID=32ade31627d6b338d5e3051b59dea3e2

From the related mountsnoop output, we can see that the nosuid and nodev flags were not passed:

buildkitd        210370  210739  4026533418  mount("/home/user/.local/share/buildkit/runc-overlayfs/snapshots/snapshots/4/fs", "/home/user/.local/tmp/containerd-mount1074778686", "bind", MS_RDONLY|MS_BIND|MS_REC, "") = 0
buildkitd        210370  210739  4026533418  mount("", "/home/user/.local/tmp/containerd-mount1074778686", "", MS_RDONLY|MS_REMOUNT|MS_BIND|MS_REC, "") = -EPERM

overlay.WriteUpperdir calls into mount.WithTempMount, which uses the containerd mount library. It looks like we end up here and then the remount fails because it doesn't have the equivalent of the UnprivilegedMountFlags logic.

from buildkit.

AkihiroSuda avatar AkihiroSuda commented on June 8, 2024

overlay.WriteUpperdir calls into mount.WithTempMount, which uses the containerd mount library. It looks like we end up here and then the remount fails because it doesn't have the equivalent of the UnprivilegedMountFlags logic.

@bcressey Thanks for analysis. Would you be interested in submitting a PR?

from buildkit.

swagatbora90 avatar swagatbora90 commented on June 8, 2024

@bcressey if okay, I can work on a fix for this.

from buildkit.

bcressey avatar bcressey commented on June 8, 2024

@swagatbora90 that'd be great! Let me know if I can help advise on setting up a test environment, or testing out a change when ready.

from buildkit.

swagatbora90 avatar swagatbora90 commented on June 8, 2024

@bcressey @AkihiroSuda Added PR to check and preserve unprivileged flags before we remount a bind mount for readonly. However, the change alone was not sufficient and also had to update the above pod spec to mount the /tmp directory from the host

pod.spec


apiVersion: v1
kind: Pod
metadata:
  name: buildkitd
spec:
  containers:
    - name: buildkitd
      image: public.ecr.aws/e5v3s6y4/buildkit-rootless:rootless
      args:
        - --addr
        - tcp://0.0.0.0:1234
        - --oci-worker-no-process-sandbox
        - --debug
      securityContext:
        seccompProfile:
          type: Unconfined
        runAsUser: 1000
        runAsGroup: 1000
      volumeMounts:
        # The first mount is not needed, but makes it explicit that there
        # is a VOLUME here which shows up as a separate mount, which is why
        # buildkit is able to find the unprivileged mount flags it needs to
        # preserve.
        - mountPath: /home/user/.local/share/buildkit
          name: buildkitd-1
        # The second mount is needed, because otherwise there's no explicit
        # mount to inspect for mount options, and the underlying filesystem's
        # mount flags are obscured by the overlayfs used for the container's
        # rootfs.
        - mountPath: /home/user/.local/tmp
          name: buildkitd-2
      env:
        # This is required to align the temporary directory created by buildkit
        # with the volume mount for that directory.
        - name: XDG_RUNTIME_DIR
          value: /home/user/.local/tmp
    - name: runner
      image: moby/buildkit:rootless
      command: [ "/bin/sh", "-c", "--" ]
      args: [ "while true; do sleep 30; done;" ]
      env:
        - name: BUILDKIT_HOST
          value: tcp://localhost:1234
  volumes:
    - name: buildkitd-1
      emptyDir: {}
    - name: buildkitd-2
      emptyDir: {}

Exposing the tmp dir as a bind mount in the container is required, otherwise the directory is just in the container root and its actual mount flags get obfuscated by overlayfs. So, the check for unprivileged flags no longer works. Inorder to make this work we need both 1) Update containerd mount library to preserve nosuid, nodev flags 2) Pod spec update to bind mount /tmp dir.

Let me know if this makes sense. I am also wondering if we no longer need #3697 since we are already checking for the flags downstream in containerd. I will test this out next.

from buildkit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.