Kubernetes — Networking & Load-Balancing
Overview
Kubernetes networking… …software-defined networking (SDN)
- …network plane spread across a cluster of machines
- …flat network structure
- …components connect without relying on hardware
Birds view on Kubernetes networking…
- Primary goal…
- …pod-to-pod communication between all cluster nodes
- …each pod has an IP address …routed within the cluster
- …eliminates need for mapping port to pods
- …no need to NAT cluster internal communication
 
- Secondary goals:
- IP address management (IPAM) …allocate IPs to pods
- Port mapping …expose Pods to the outside world
- Bandwidth control …egress/ingress traffic rates
- Source NAT …for traffic leaving the cluster
 
Basic Communication
Four basic types of network communication:
- Container-to-container
- …smallest unit in a Kubernetes network
- …containers in the same pod share the same network namespace
- …containers communicate within a single pod through localhost
- …containers share IP address and port space
 
- Pod-to-pod
- …each pod in a cluster has its own unique IP address
- …direct communication between pods on all cluster nodes
- …pod IP addresses are ephemeral …recreation of a pod changes the IP
 
- Pod-to-service …service abstraction
- …facilitate both pod-to-service & external-to-service communication
- …enables external traffic exposure to cluster internal applications
- …provides load balancing & service discovery for logic sets of pods
 
Traffic Patterns
Jargon for network traffic patterns…
- Ingress …defines rules for incoming traffic to pods
- Egress …defines rules for outgoing traffic from pods
- North-South
- North traffic (incoming traffic)
- …typically handled by a load balancer
- …external IP address to forward traffic
 
- South traffic (outgoing traffic)
- …return a response …call external services
- …requires an Egress resource
 
 
- North traffic (incoming traffic)
- East-West traffic (internal traffic) …services to service communication
Container Network Interface
Container Network Interface (CNI) …under the governance of the CNCF
What CNI plugins do?
- Connectivity
- …create network namespaces
- …assign IP addresses
- …set up network routes
 
- Reachability
- …enable pod-to-pod communication
- …within the same node and across nodes
 
Specification & libraries to write plugins for container networking…
- CNI providers implement plugin as binary executable
- …invoked by the container engine via the Kubelet process
- …container runtime create a network namespace before invoking the CNI plugin
- …plugin responsible for connecting the network interface
 
- Kubernetes provides a default CNI…
- …third-party plugins include Cilium, Calico, Flannel, Istio…
- …differ in their approach to overlay networks, direct routing, etc.
 
Reachability
Basic Terminology for networks:
- Underlay network — Physical infrastructure
- Enables IP package forwarding…
- …cables, switches, routers
 
- OSI transport layer works as transition layer
- Overlay network — Software-driven transportation
- …abstracts low-level details for traffic forwarding
- …overlay implements virtual networks
- …create multiple logical networks over the underlay
 
Connection between nodes depends on underlying layer 2 network…
- Shared — Nodes share a layer 2 network
- …connectivity by static routes or full mesh
 
- Decoupled — Nodes connected to different layer 2 networks
- …encapsulation in the overlay (e.g. VXLAN)
- …orchestrating the underlay (e.g. BGP)
 
Implementation
Reference implementation for CNI plugins1 include…
- bridgecreate a bridge network an attaches pods
- vlanallocates a VLAN device
- host-deviceattach to an existing host device
- ptpcreate a virtual- ethpair
CNI plugin implmentations…
- Kindnet2
- Reachability …one static route per peer node
- Connectivity …mix of reference CNI plugins
- …ptpto createvethlinks
- …host-localto allocate IPs
- …portmapfor port mapping
- …kindnetddaemon generates configuration files
 
- …
 
- Flannel
- Reachability …managed by flannelddaemon- …generates a host-localIPAM configuration
- …creates a VXLAN interface flannel.1
- …discovers VXLAN information from other nodes
- …builds a local unicast head-end replication (HER) table
 
- …generates a 
- Connectivity …generates a bridge
 
- Reachability …managed by 
- Calico3
- Connectivity…
- …creates vethlink
- …setup host-route pointing to vethlink
- …egress link setup with proxy_arp
 
- …creates 
- Reachability…
- Static route & overlay mode …supports VXLAN
- BGP mode …BGP speaker on every node
 
 
- Connectivity…
- Cilium
- Connectivity…
- …creates a vethlink
- …eBPF program performs traffic forwarding
 
- …creates a 
- Reachability…
- …tunnelmode …VXLAN interfaces to forward traffic
- …native-routingmode …provided by underlay …static routes or BGP
 
- …
 
- Connectivity…
Network Policies
Network policies …filter traffic from/to pods
- …labels & selectors specify which policy applies to a pod
- …define and manage security policies for network communication
- Default pod communication within cluster not secured…
- …if a cluster is not using network policies
- …pods by default do not filter incoming traffic
- …no firewall rules for inter-pod communication
 
IP Addresses
Pods have a unique IP from a PodCIDR range…
- …CIDR ranged assigned to a node during kubelet configuration
- …node are not aware of CIDRs assigned to other nodes
Non-overlapping IP addresses for pods, services & nodes
>>> kubectl get configmaps -n kube-system kubeadm-config -o yaml | grep -i subnet
      podSubnet: 10.244.0.0/16
      serviceSubnet: 10.96.0.0/16
# list per node IP address range
kubectl get nodes -o jsonpath='{range.items[*]}{.metadata.name}{"\t"}{.spec.podCIDR}{"\n"}'
# show pod with IP addresses
kubectl get pod -o wide
# show only the IP address
kubectl get pod $pod_name --template '{{.status.podIP}}'DNS Service
Kubernetes cluster automatically provides a DNS service…
- Assigns readable names…
- …lightweight mechanism for service discovery
- …in addition to the pod IP address assignment
- …ephemeral IP addresses are not reliable endpoints for communication
- …service consumers should avoid using IP address
 
- Kubernetes DNS services4 for pods…
- …have at least one corresponding A/AAAA DNS record
- …format depends on the type of the serivce
- …some service may have SRV and PTR records
 
CoreDNS
CoreDNS implements the Kubernetes DNS spec5:
- …compiled to a static binary …deployed into the Kubernetes cluster
- Service discovery…
- Server-side
- …exposed as a ClusterIPservice
- …DNS service inside cluster …based on network forwarding rules
- …stores Kubernetes service, pods and endpoint objects
- …acts as DNS proxy for all internal domains
 
- …exposed as a 
- Client-side
- …controlled by spec.dnsPolicy6 (per-pod basis)
- …by default Kubelet configures cluster DNS IP in resolv.conf
- …internal DNS has precedents over eternal DNS
 
- …controlled by 
 
- Server-side
- Kubernetes nodes can run a local DNS cache7
Example
Setup an example in the default namespace…
# Create a website deployment…
kubectl create deployment website --replicas=3 --image=httpd
# …and a service
kubectl expose deployment website --port=80
# Start a client…
kubectl run -it client --image busybox
# …later clean up 
kubectl delete pod/client service/website deploy/websiteFind the IP address of the cluster DNS server…
>>> kubectl get service/kube-dns -n kube-system
NAME       TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                  AGE
kube-dns   ClusterIP   10.96.0.10   <none>        53/UDP,53/TCP,9153/TCP   178mDNS resolution in a pod…
- 10.96.0.10is the address of the DNS server
- Default internal domain name for a cluster is cluster.local- …subdomain per namespace $namespace.svc.cluster.local
- …service for example website.default.svc.cluster.local.
 
- …subdomain per namespace 
# Linux resolver configuration in a pod
>>> cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local localdomain
nameserver 10.96.0.10
options ndots:5Send a request by referencing the service name
>>> wget -qO - website
<html><body><h1>It works!</h1></body></html>
>>> wget -qO - website.default
<html><body><h1>It works!</h1></body></html>
>>> wget -qO - website.foo
wget: bad address 'website.foo'External DNS
Required to discover…
- …external resources use from the Kubernetes cluster
- …external load-balancing service
- …ingress and gateway services
Two options to integrate external DNS resolution…
Services Abstraction
Multi-pod service abstraction groups similar pods & load-balance traffic to them
# list all services in a cluster
kubectl get service
# create a load-balancer service object 
kubectl expose $object $name --type=LoadBalancer --name=$name
# remove a service object
kubectl delete service $nameWhy using a service?
- Pod groups
- …all pods with a similar label represent a service
- …incoming traffic is load-balanced to all pods in a service
 
- Service exposure …either cluster internal and/or external
- Route external connections
- …clients do not need to know individual pods
- …single constant IP-address for a service
 
Overview
Kubernetes provides several services to facilitate external traffic into a cluster…
- Headless …simplest load-balancing (round-robin) by DNS
- ExternalName …access to a service by external DNS name
- ClusterIP …default for for internal communications
- NodePort …exposes a service on a static port on each node’s IP
- …makes the service accessible outside of the cluster
- …most basic way to perform external-to-service networking
 
- LoadBalancer …standard for external-service networking
- …assigns service to a public IP address
- …external load balancer is then directed to the backend pod
 
- Ingress …collection of routing rules surrounding external access to services
| ClusterIP | NodePort | LoadBalancer | Ingress | |
|---|---|---|---|---|
| Native | yes | yes | yes (with CNI) | yes (with CNI) | 
| OSI | layer 4 | layer 4 | layer 4 and below | layer 7 (only HTTP & HTTPS) | 
| Multiple services per IP | no | no | yes (multiple ports) | yes | 
| Expose oustside the cluster | no | yes | yes (1 service) | yes (multiple services) | 
ClusterIP
ClusterIP — Reserve a static virtual IP address
- …internal IP …reachable only within the cluster
- …maintains the security boundaries of the cluster
- Pod-to-service communication…
- …used for internal communications between pods and services
- …traffic load-balanced within the cluster
 
Example
Create some pods via a deployment
>>> kubectl create deployment website --replicas=3 --image=httpd
# all pods have the `app=website` label
>>> kubectl get pod --show-labels
NAME                       READY   STATUS    RESTARTS   AGE    LABELS
website-5d755d9996-2h4c2   1/1     Running   0          3m1s   app=website,pod-template-hash=5d755d9996
website-5d755d9996-2vpdm   1/1     Running   0          3m1s   app=website,pod-template-hash=5d755d9996
website-5d755d9996-fck9z   1/1     Running   0          3m1s   app=website,pod-template-hash=5d755d9996Create a ClusterIP service
>>> cat > service.yaml <<EOF
apiVersion: v1
kind: Service
metadata:
  name: website
spec:
  ports:
  - port: 80
    name: http
  selector:
    app: website
EOF
>>> kubectl apply -f service.yaml
# list the service including IP (only an internal IP)
>>> kubectl get service website       
NAME      TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)   AGE
website   ClusterIP   10.96.194.111   <none>        80/TCP    2m11s
# IP address allocated from a predefined range
>>> kubectl cluster-info dump | grep -m 1 service-cluster-ip-range
                            "--service-cluster-ip-range=10.96.0.0/16"
>>> kubectl get endpointslice | grep ^website         
website-5x9x7   IPv4          80      10.244.2.89,10.244.1.133,10.244.1.139 8m41sQuery the ClusterIP from a client pod:
# Start a client pod…
>>> kubectl run -it client --image busybox
# …send a GET request to the ClusterIP
>> wget -qO - http://10.96.194.111
<html><body><h1>It works!</h1></body></html>Check the service logs for the rerquest
kubectl logs -l app=website | grep GETObserver changes after:
# modify the replication scale
kubectl scale deployment website --replicas=2
# remove a pod
kubectl delete pod website-$nameClean up…
kubectl delete pod/client service/website deploy/websiteNode Port
NodePort — Forward traffic to specific port
- Accessible from outside the cluster via node IP address
- Each node forwards traffic to a specific port
Example
Nodeport service manifest…
service.yaml
apiVersion: v1
kind: Service
metadata:
  name: website
spec:
  type: NodePort
  ports:
  - port: 80
    nodePort: 30080
    name: http
  selector:
    app: websitekubectl apply -f service.yaml
kubectl create deployment website --replicas=3 --image=httpd
# later clean up
kubectl delete deployment/website service/websiteVerify by connecting to the node port:
# identify the workers hosting the pods
>>> kubectl get pods -o wide    
NAME                       READY   STATUS    RESTARTS   AGE    IP NODE            NOMINATED NODE   READINESS GATES
website-5d755d9996-4dcr2   1/1     Running   0          112s   10.244.2.45        delta-worker2   <none>           <none>
website-5d755d9996-jnd8z   1/1     Running   0          112s   10.244.1.37        delta-worker    <none>           <none>
website-5d755d9996-tcswb   1/1     Running   0          112s   10.244.1.157       delta-worker    <none>           <none>
# select one of the nodes and get the node IP-address
>>> kubectl get nodes delta-worker -o wide
NAME           STATUS   ROLES    AGE   VERSION   INTERNAL-IP   EXTERNAL-IP OS-IMAGE                         KERNEL-VERSION           CONTAINER-RUNTIME
delta-worker   Ready    <none>   28m   v1.33.1   172.18.0.4    <none>      Debian GNU/Linux 12 (bookworm)   6.14.6-200.fc41.x86_64   containerd://2.1.1
# send a GET request to the node port
>>> curl -s 172.18.0.4:30080
<html><body><h1>It works!</h1></body></html>Load Balancer
Requires an external load balancer with public IP
- Accessible from outside via load balancer IP address
- For production …distributes traffic over nodes
apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  type: LoadBalancer
  selector:
    app: my-app          # identify pods for this service
  ports:
    - port: 8080         # service port
      targetPort: 8080   # foward to container portConfigure session affinity for a service…
- …clients are redirected to the same port every time
- …defaults to Noneif not specified
spec:
  sessionsAffinity: ClientIPPods backing a service can not see the actual client IP-address…
- …packets source IP changed for cluster internal routing
- …SNAT (Source Network Access Translation) performed on each package
TODO Orchestration
Kubernetes orchestration of external load-balancer depends on environment:
Cloud-based cluster…
- Network Load Balancer (NLB) for Amazon Elastic Kubernetes Service (EKS)
- Standard Load Balancer for Azure Kubernetes Services (AKE)
- Cloud Load Balancer for Google Kubernetes Engine (GKE)
- LBaaS plugin for Openstack
- NSX ALB for VMWare
- …in-cluster component called cloud-controller-manager
On-prem cluster…
- …existing load-balancer appliances
- …direct interaction with the physical network by cluster add-ons
- ARP (for L2 integration)
- BGB (for L3 integration)
 
- Kubernetes hosted load-balancers…
- MetalLB
- ARP and BGP modes …custom user-space implementation
- Configured with ConfigMapand custom CRD based operator
 
- OpenELB (part of Kubesphere)
- ARB and BGP modes
- Configuration via CRDs
 
- Kube-vip
- ARB and BGP modes
- Configured via flags, environment variables and ConfigMap
 
- ServiceLB10 (aka Klipper, integrated with K3s)
- Exposes LoadBalanceras host ports on all cluster noeds
 
- Exposes 
 
- MetalLB
Ingress
Ingress — Associate a URL with a backend service
- Operates on the application layer of the network stack (HTTP)
- Works in conjunction with Kubernetes Services and Endpoints
- Why use ingress?
- Path- & Host-based routing …typically an URL path
- Multiple services can share a single IP-address
- Manage SSL certificates and terminate SSL connections
- Manage authentication and authorization
 
- Terminology…
- Ingress Resource — Routing rules directing external traffic to a services
- Ingress Controller — Implements the rules defined in the ingress resource
- Backend Services — Services receiving the traffic directed by Ingress
 
Reverse Proxy
Ingress serves as a reverse proxy
- …intermediary between client and server that forwards requests
- Ingress Controllers are pods running within the Kubernetes cluster…
- …enforcing the rules set in the Ingress resources
- Control flow of inbound requests and direct them to the appropriate service
Ingress Rules
Ingress resources contain one or more Ingress rules
- Component of the Ingress resource that specifies the actual routing logic
- Each Ingress rule specifies a set of conditions (like host and path) and the corresponding backend service to which the traffic should be directed
- Path-Based Routing
- Routing directs traffic based on the URL path
- example.org/hello&- example.org/worldredirect to different services
 
- Host-Based Routing
- Routing traffic bases on the hostname (or domain)
- hello.example.org&- world.example.orgredirect to different services
 
Port Forward
Connect to a specific pod without going through a service…
- …typically for debugging & testing individual pods
- …notation is local port, colon followed by port in the pod
# forward a local network port to a port in the pod
kubectl port-forward $pod_name 30080:80   # local:remote
# select a specific container
kubectl port-forward $pod_name 38080:8080 -c $container_name
# multiple ports
kubectl port-forward $pod_name 30080:80,30443:443SSH tunneling to a node with accesses to a Kubernetes cluster
ssh -L 38080:localhost:38080 $user@$node
kubectl port-forward $pod_name 38080:8080Use proxy management like kubefwd14
Footnotes
- CNI Network Plugins 
 https://github.com/containernetworking/plugins↩︎
- Kindnet, CNI Plugin 
 https://kindnet.es
 https://github.com/aojea/kindnet↩︎
- Calico, CNI Plugin 
 https://docs.tigera.io↩︎
- DNS for Services and Pods, Kubernetes Documentation 
 https://kubernetes.io/docs/concepts/services-networking/dns-pod-service↩︎
- Kubernetes DNS, CoreDNS Documentation 
 https://coredns.io/plugins/kubernetes↩︎
- Pod’s DNS Policy, Kubernetes Documentation 
 https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy↩︎
- Using NodeLocal DNSCache in Kubernetes Clusters 
 https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns↩︎
- ExternalDNS, GitHub 
 https://github.com/kubernetes-sigs/external-dns↩︎
- k8s_gateway, GitHub 
 https://github.com/ori-edge/k8s_gateway↩︎
- ServiceLB, GitHub 
 https://github.com/k3s-io/klipper-lb↩︎
- NGINX Ingress Controller 
 https://docs.nginx.com/nginx-ingress-controller↩︎
- Traefik Proxy 
 https://doc.traefik.io/traefik↩︎
- HAProxy Ingress Controller 
 https://www.haproxy.com/documentation/kubernetes-ingress↩︎
- kubefwdProject, GitHub
 https://github.com/txn2/kubefwd↩︎