Demystifying the Life of a Kubernetes Network Packet with Calico
It’s often useful to be able to follow network traffic within a Kubernetes cluster to debug various networking situations that can occur. This guide outlines the network traffic of a pod communicating with another pods HTTPS endpoint at each hop in a Kubernetes cluster. An IBM Cloud Red Hat Openshift MZR cluster with Calico as the SDN provider is utilized for the experiments. To start we take an existing VPC IBM Cloud Red Hat Openshift cluster in the us-east region. In our experiment: we will send an HTTPS request from a pod in us-east-3
to a pod in us-east-1
. The selected pods and nodes are shown below
SOURCE POD
==========
$ kubectl get pods -n openshift-ingress -o wideNAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
router-default-77949c967-4t8f7 1/1 Running 0 4h2m 172.17.25.79 10.240.128.12 <none> <none>kubectl get node 10.240.128.12 --show-labels
NAME STATUS ROLES AGE VERSION LABELS
10.240.128.12 Ready master,worker 3d8h v1.21.6+bb8d50a arch=amd64,beta.kubernetes.io/arch=amd64,...,topology.kubernetes.io/zone=us-east-3DESTINATION POD
==========
$ kubectl get pods -n openshift-ingress -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
router-default-77949c967-tgth7 1/1 Running 0 4h8m 172.17.25.239 10.240.0.9 <none> <none>kubectl get node 10.240.0.9 --show-labels
NAME STATUS ROLES AGE VERSION LABELS
10.240.0.9 Ready master,worker 3d8h v1.21.6+bb8d50a arch=amd64,beta.kubernetes.io/arch=amd64,...,topology.kubernetes.io/zone=us-east-1
In order to visualize the traffic: we are going to run tcpdump commands on the nodes holding the source and destination pods. We will open a shell that will allow us to do that with the oc debug node NODE_NAME
command. The example commands for this cluster are shown below
oc debug node/10.240.0.9
Creating debug namespace/openshift-debug-node-865g6 ...
Starting pod/1024009-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.240.0.9
If you don't see a command prompt, try pressing enter.
sh-4.4# nsenter -t 1 -m -u -i -n -p -- sudo su
[root@kube-brog4mmw0beep0m011p0-vpcoceast-default-00023971 /]#oc debug node/10.240.128.12
Creating debug namespace/openshift-debug-node-tbm65 ...
Starting pod/1024012812-debug ...
To use host binaries, run `chroot /host`
Pod IP: 10.240.128.12
If you don't see a command prompt, try pressing enter.
sh-4.4# nsenter -t 1 -m -u -i -n -p -- sudo su
[root@kube-brog4mmw0beep0m011p0-vpcoceast-default-00023b09 /]#
Request Traffic Flow
Now we are ready to trace the traffic from source to destination. The request starts on the node that holds the source pod. Every pod in Kubernetes gets its own unique Calico interface if it is apart of the SDN. The network interface that is assigned to the source pod needs to be located first. It can be located by scanning the ip route command output for the pod IP of the source pod. These steps for this cluster are shown below:
[root@kube-brog4mmw0beep0m011p0-vpcoceast-default-00023b09 /]# ip routedefault via 10.240.128.1 dev eth0
10.240.128.0/24 dev eth0 proto kernel scope link src 10.240.128.12
172.17.25.66 dev calib1a96c2cc82 scope link
...
172.17.25.79 dev cali41f14107a8d scope link
...
The above output shows Calico interface cali41f14107a8d
is assigned to the source pod. When traffic leaves the source pod: it will come through this interface. We can visualize this by executing an HTTPS request to the destination pod and then using tcpdump to analyze the traffic out of the interface. The HTTPS request is executed from the source pod by execing into the source pod and running a curl command to the destination pod IP. In parallel: a tcpdump command is launched to trace network traffic from the Calico interface.
INITIATING CURL REQUEST
=========================
oc exec -it -n openshift-ingress router-default-77949c967-4t8f7 bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
bash-4.4$ curl -k https://172.17.25.239
<html>
...
TRACING CURL REQUEST WITH TCPDUMP
=========================
[root@kube-brog4mmw0beep0m011p0-vpcoceast-default-00023b09 /]# tcpdump -nni cali41f14107a8d dst 172.17.25.239 and port 443tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on cali41f14107a8d, link-type EN10MB (Ethernet), capture size 262144 bytes15:24:08.636733 IP 172.17.25.79.38362 > 172.17.25.239.443: Flags [S], seq 694140744, win 28800, options [mss 1440,sackOK,TS val 291613257 ecr 0,nop,wscale 9], length 0
15:24:08.637881 IP 172.17.25.79.38362 > 172.17.25.239.443: Flags [.], ack 2893467653, win 57, options [nop,nop,TS val 291613258 ecr 291623097], length 0
The traffic can be seen above. The network packets are destined to the destination pod address and port from the source pod. The ip route
command can be used to determine the next hop of the traffic.
[root@kube-brog4mmw0beep0m011p0-vpcoceast-default-00023b09 /]# ip route
default via 10.240.128.1 dev eth0
...
172.17.25.192/26 via 10.240.0.9 dev tunl0 proto bird onlink
...
This output shows that the next hop for the traffic is through Calico’s local tunnel interface on the source node. This interface sends the traffic to the Calico pod running on the node which will handle encapsulating the traffic so it can appropriately flow to the destination node over the IBM Cloud network . The traffic flow into the tunnel interface is analyzed with tcpdump below.
[root@kube-brog4mmw0beep0m011p0-vpcoceast-default-00023b09 /]# tcpdump -nni tunl0 dst 172.17.25.239 and port 443
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tunl0, link-type RAW (Raw IP), capture size 262144 bytes15:36:33.752293 IP 172.17.25.79.54366 > 172.17.25.239.443: Flags [S], seq 3776230360, win 28800, options [mss 1440,sackOK,TS val 292358373 ecr 0,nop,wscale 9], length 0
15:36:33.753395 IP 172.17.25.79.54366 > 172.17.25.239.443: Flags [.], ack 455994723, win 57, options [nop,nop,TS val 292358374 ecr 292368213], length 0
Now that the traffic has been received by Calico through the tunnel interface: Calico will proceed to encapsulate the traffic and then send the encapsulated traffic to the address of the node holding the destination pod. The traffic can be traced by running tcpdump on the default interface of the source node.
[root@kube-brog4mmw0beep0m011p0-vpcoceast-default-00023b09 /]# tcpdump -nni eth0 'proto 4 and host 10.240.0.9 and (ip[36:4]==0xac1119ef)'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decodelistening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
15:44:29.424368 IP 10.240.128.12 > 10.240.0.9: IP 172.17.25.79.47102 > 172.17.25.239.443: Flags [.], ack 4097, win 73, options [nop,nop,TS val 292834045 ecr 292843884], length 0 (ipip-proto-4)
15:44:29.425525 IP 10.240.128.12 > 10.240.0.9: IP 172.17.25.79.47102 > 172.17.25.239.443: Flags [.], ack 4812, win 78, options [nop,nop,TS val 292834046 ecr 292843885], length 0 (ipip-proto-4)
Note the deployment of Calico in this cluster uses IPIP encapsulation. Calico also offers VXLAN encapsulation which would require a different tcpdump command to monitor (that will not be covered in this article). At this point the network traffic has left the node and is now traversing IBM Cloud’s network. The traffic can be seen arriving at the destination node by running tcpdump on the default interface of the destination node.
[root@kube-brog4mmw0beep0m011p0-vpcoceast-default-00023971 /]# tcpdump -nni eth0 'proto 4 and host 10.240.128.12 and (ip[32:4]==0xac1119ef || ip[36:4]==0xac1119ef)'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes18:13:37.849725 IP 10.240.128.12 > 10.240.0.9: IP 172.17.25.79.38570 > 172.17.25.239.443: Flags [.], ack 1, win 57, options [nop,nop,TS val 301782470 ecr 301792309], length 0 (ipip-proto-4)
18:13:37.855892 IP 10.240.128.12 > 10.240.0.9: IP 172.17.25.79.38570 > 172.17.25.239.443: Flags [P.], seq 1:518, ack 1, win 57, options [nop,nop,TS val 301782476 ecr 301792309], length 517 (ipip-proto-4)
Once at the destination node: the traffic will flow to the tunnel interface of Calico on the destination node which will handle decapsulating the traffic and then sending the traffic to the Calico interface of the destination pod. You can see the decapsulated traffic on the tunnel interface below.
[root@kube-brog4mmw0beep0m011p0-vpcoceast-default-00023971 /]# tcpdump -nni tunl0 -e host 172.17.25.239 and host 172.17.25.79
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tunl0, link-type RAW (Raw IP), capture size 262144 bytes18:15:09.656398 ip: 172.17.25.79.44642 > 172.17.25.239.443: Flags [.], ack 1, win 57, options [nop,nop,TS val 301874270 ecr 301884108], length 0
18:15:09.659949 ip: 172.17.25.79.44642 > 172.17.25.239.443: Flags [P.], seq 1:518, ack 1, win 57, options [nop,nop,TS val 301874280 ecr 301884108], length 517
Now the traffic will ultimately flow to the destination’s pod calico interface. That next hop can be looked up with ip route
and the traffic flow visualized with tcpdump.
[root@kube-brog4mmw0beep0m011p0-vpcoceast-default-00023971 /]# ip route
default via 10.240.0.1 dev eth0
...
172.17.25.239 dev calic25e805a467 scope link
...[root@kube-brog4mmw0beep0m011p0-vpcoceast-default-00023971 /]# tcpdump -nni calic25e805a467 host 172.17.25.239 and host 172.17.25.79
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on calic25e805a467, link-type EN10MB (Ethernet), capture size 262144 bytes18:17:43.663824 IP 172.17.25.79.57590 > 172.17.25.239.443: Flags [.], ack 8005, win 95, options [nop,nop,TS val 302028284 ecr 302038111], length 0
18:17:43.664032 IP 172.17.25.79.57590 > 172.17.25.239.443: Flags [P.], seq 697:721, ack 8006, win 95, options [nop,nop,TS val 302028284 ecr 302038112], length 24
The destination pod has now received the network traffic and will proceed to generate a response. The path from the source to the destination has now been fully traced.
Response Traffic Flow
The response path can now be traced. The response traffic is first sent from the destination application through the destination pod’s Calico interface.
[root@kube-brog4mmw0beep0m011p0-vpcoceast-default-00023971 /]# tcpdump -nni calic25e805a467 host 172.17.25.79tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on calic25e805a467, link-type EN10MB (Ethernet), capture size 262144 bytes19:52:13.509080 IP 172.17.25.239.443 > 172.17.25.79.48084: Flags [.], ack 518, win 58, options [nop,nop,TS val 307707969 ecr 307698129], length 0
19:52:13.509803 IP 172.17.25.239.443 > 172.17.25.79.48084: Flags [P.], seq 1:4097, ack 518, win 58, options [nop,nop,TS val 307707970 ecr 307698129], length 4096
That traffic then flows to the tunnel interface of the destination node holding the destination pod. Calico will proceed to encapsulate the traffic .
[root@kube-brog4mmw0beep0m011p0-vpcoceast-default-00023971 /]# tcpdump -nni tunl0 host 172.17.25.79
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tunl0, link-type RAW (Raw IP), capture size 262144 bytes19:54:38.071945 IP 172.17.25.239.443 > 172.17.25.79.53320: Flags [P.], seq 1:4812, ack 518, win 58, options [nop,nop,TS val 307852532 ecr 307842690], length 4811
The encapsulated traffic flows out of the destination nodes default interface destined to the address of the source node holding the source pod.
[root@kube-brog4mmw0beep0m011p0-vpcoceast-default-00023971 /]# tcpdump -nni eth0 'proto 4 and host 10.240.128.12 and (ip[32:4]==0xac1119ef || ip[36:4]==0xac1119ef)'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes20:00:00.708455 IP 10.240.0.9 > 10.240.128.12: IP 172.17.25.239.443 > 172.17.25.79.35438: Flags [.], ack 722, win 58, options [nop,nop,TS val 308175168 ecr 308165328], length 0 (ipip-proto-4)
The traffic flows across IBM Cloud’s networking infrastructure to the default interface of the source node holding the source pod.
[root@kube-brog4mmw0beep0m011p0-vpcoceast-default-00023b09 /]# tcpdump -nni eth0 'proto 4 and host 10.240.128.12 and (ip[32:4]==0xac1119ef || ip[36:4]==0xac1119ef)'
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes20:02:54.551260 IP 10.240.0.9 > 10.240.128.12: IP 172.17.25.239.443 > 172.17.25.79.48516: Flags [S.], seq 2835657314, ack 3611070988, win 28560, options [mss 1440,sackOK,TS val 308349011 ecr 308339170,nop,wscale 9], length 0 (ipip-proto-4)
Then that traffic is sent to the tunnel interface of the source node to get decapsulated and processed by Calico.
[root@kube-brog4mmw0beep0m011p0-vpcoceast-default-00023b09 /]# tcpdump -nni tunl0 host 172.17.25.239
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on tunl0, link-type RAW (Raw IP), capture size 262144 bytes20:06:30.119185 IP 172.17.25.239.443 > 172.17.25.79.60852: Flags [.], seq 5322:6750, ack 697, win 58, options [nop,nop,TS val 308564578 ecr 308554739], length 1428
Calico will ultimately forward that traffic to the source pod’s Calico interface at which point the application will receive and process the response traffic.
[root@kube-brog4mmw0beep0m011p0-vpcoceast-default-00023b09 /]# tcpdump -nni cali41f14107a8d src 172.17.25.239 and port 443tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on cali41f14107a8d, link-type EN10MB (Ethernet), capture size 262144 bytes
20:08:27.468773 IP 172.17.25.239.443 > 172.17.25.79.40266: Flags [S.], seq 3755735872, ack 838715937, win 28560, options [mss 1440,sackOK,TS val 308681928 ecr 308672088,nop,wscale 9], length 0
At this point we have successfully traced both request and response traffic across a multi zone IBM Cloud Openshift Cluster. These same steps can be applied independent of the cloud environment the Kubernetes/Openshift cluster is in as well. These steps can be utilized in different debug scenarios to help users diagnose and root cause various networking scenarios that can occur in a customer’s Kubernetes/Openshift cluster.