Restricting Root: Using SELinux to Limit Access to Container Engine Socket with SELinux in Kubernetes/Openshift

3 min readFeb 6, 2022

Every Kubernetes node runs a container engine (cri-o, containerd, etc) that the kubelet process interacts with to run and manage containers on the node. The kubelet talks to the container engine over a UNIX socket that the container engine creates at startup. Access to write and read from the unix socket allows admin level controls to all containers on the host including the ability to start/stop containers and exec into running containers and modify components inside the running container. Containerized workload by default cannot access this socket as it exists only at the node level typically in the /var/run directory. If a malicious process is able to break out of the container isolation and gain access to that socket: the process will gain admin privileges on the container engine. This story outlines some additional SELinux policies that can be applied to prevent this attack scenario. An Openshift 4 cluster will be used as a basis for the experiment.

First let’s start by scheduling a pretend malicious process that has broken out of container isolation. To do this: we can run oc debug node/NODENAME to schedule a privileged process to the specified Openshift node.

oc get node
NAME             STATUS   ROLES           AGE     VERSION
10.184.253.185   Ready    master,worker   5h58m   v1.22.3+e790d7f
10.184.253.199   Ready    master,worker   5h58m   v1.22.3+e790d7f
10.184.253.200   Ready    master,worker   5h58m   v1.22.3+e790d7foc debug node/10.184.253.185
Creating debug namespace/openshift-debug-node-2q4ln ...
Starting pod/10184253185-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.184.253.185
sh-4.4# nsenter -t 1 -m -u -i -n -p -- sudo su
[root@test-c7vs99p207rj7vnihghg-bpvg1644151-default-000003cc /]#

At this point: we have scheduled a sample malicious process and simulated breaking out of container isolation to root on the host. Below shows that the process is able to communicate with the crio socket and has admin control over the container engine.

[root@test-c7vs99p207rj7vnihghg-bpvg1644151-default-000003cc /]# printf GET | socat UNIX-CONNECT:/var/run/crio/crio.sock -
HTTP/1.1 400 Bad Request
Content-Type: text/plain; charset=utf-8
Connection: close400 Bad Request

A custom SELinux policy is going to be used to restrict this further. We will start by defining the policy below:

mkdir -p /tmp/restrictedcontainerenginesockcat <<'EOF' >/tmp/restrictedcontainerenginesock/restrictedcontainerenginesock.te
policy_module(restrictedcontainersock, 1.0);
gen_require(`
  type unconfined_service_t;
  type container_runtime_t;
  type container_var_run_t;
  type var_run_t;
  type spc_t;
')
type restricted_containerengine_sockets_t;
fs_associate_tmpfs(restricted_containerengine_sockets_t)
type_transition container_runtime_t container_var_run_t:sock_file restricted_containerengine_sockets_t "crio.sock";
allow spc_t container_var_run_t: sock_file { relabelfrom };
allow spc_t restricted_containerengine_sockets_t: sock_file { relabelto };
allow unconfined_service_t restricted_containerengine_sockets_t: sock_file { append getattr open read write };
allow container_runtime_t restricted_containerengine_sockets_t: sock_file { append getattr open read write };
EOF

The policy defines type restricted_containerengine_sockets_t that will only allow domains unconfined_service_t and container_runtime_t to interact with the filetype. When this is loaded: this means only systemd units (like kubelet) and the crio container runtime itself will be able to interact with the socket. Note there are two additional rules that allow a domain to perform the initial relabel from the regular crio socket SELinux type to the restricted SELinux type. These are the relabelfrom and relabelto rule sections. This will successfully block our “malicious” process from being able to access the socket once loaded. To illustrate this: the policy needs to be built and activated in the node. The commands below will execute those steps on a RHEL 7 or 8 operating system.

yum install setools-console selinux-policy-devel -y
cd /tmp/restrictedcontainerenginesock
make -f /usr/share/selinux/devel/Makefile restrictedcontainerenginesock.pp
semodule -i restrictedcontainerenginesock.pp

Now that the policy is built and activated: the cri-o socket needs to be relabeled in selinux to be type restricted_containerengine_sockets_t:

semanage fcontext -a -t restricted_containerengine_sockets_t /var/run/crio/crio.sock
chcon -t restricted_containerengine_sockets_t /var/run/crio/crio.sock

Now that the socket has been relabeled: the malicious process (even with root privileges) will be denied when communicating with the container engine socket:

[root@test-c7vs99p207rj7vnihghg-bpvg1644151-default-000003cc restrictedcontainerenginesock]# printf GET | socat UNIX-CONNECT:/var/run/crio/crio.sock -2022/02/06 13:46:57 socat[10623] E connect(5, AF=1 "/var/run/crio/crio.sock", 25): Permission denied

Consequences of the policy

The policy outlined above does have some consequences when installed. Many Kubernetes/Openshift monitoring/logging solutions make use of the container socket to do read only commands for metadata associated with an image to tag metrics/logs with. When this policy is applied: all those operations will fail since those components are not in the allowed SELinux types to communicate with the socket.

Additionally: if the user is root: they can proceed to add more policies to allow more access to this type. However: those should be logged and provide a way to detect the malicious activity. A reboot can also be required before activating any policy changes with setsebool secure_mode_policyload on which should provide an additional layer of detection for malicious changes.

Restricting Root: Using SELinux to Limit Access to Container Engine Socket with SELinux in Kubernetes/Openshift

Consequences of the policy

Written by Tyler Lisowski