Kubernetes — Networking & Load-Balancing
Overview
Kubernetes networking… …software-defined networking (SDN)
- …network plane spread across a cluster of machines
- …flat network structure
- …components connect without relying on hardware
Birds view on Kubernetes networking…
- Primary goal…
- …pod-to-pod communication between all cluster nodes
- …each pod has an IP address …routed within the cluster
- …eliminates need for mapping port to pods
- …no need to NAT cluster internal communication
- Secondary goals:
- IP address management (IPAM) …allocate IPs to pods
- Port mapping …expose Pods to the outside world
- Bandwidth control …egress/ingress traffic rates
- Source NAT …for traffic leaving the cluster
Basic Communication
Four basic types of network communication:
- Container-to-container
- …smallest unit in a Kubernetes network
- …containers in the same pod share the same network namespace
- …containers communicate within a single pod through
localhost
- …containers share IP address and port space
- Pod-to-pod
- …each pod in a cluster has its own unique IP address
- …direct communication between pods on all cluster nodes
- …pod IP addresses are ephemeral …recreation of a pod changes the IP
- Pod-to-service …service abstraction
- …facilitate both pod-to-service & external-to-service communication
- …enables external traffic exposure to cluster internal applications
- …provides load balancing & service discovery for logic sets of pods
Traffic Patterns
Jargon for network traffic patterns…
- Ingress …defines rules for incoming traffic to pods
- Egress …defines rules for outgoing traffic from pods
- North-South
- North traffic (incoming traffic)
- …typically handled by a load balancer
- …external IP address to forward traffic
- South traffic (outgoing traffic)
- …return a response …call external services
- …requires an Egress resource
- North traffic (incoming traffic)
- East-West traffic (internal traffic) …services to service communication
Container Network Interface
Container Network Interface (CNI) …under the governance of the CNCF
What CNI plugins do?
- Connectivity
- …create network namespaces
- …assign IP addresses
- …set up network routes
- Reachability
- …enable pod-to-pod communication
- …within the same node and across nodes
Specification & libraries to write plugins for container networking…
- CNI providers implement plugin as binary executable
- …invoked by the container engine via the Kubelet process
- …container runtime create a network namespace before invoking the CNI plugin
- …plugin responsible for connecting the network interface
- Kubernetes provides a default CNI…
- …third-party plugins include Cilium, Calico, Flannel, Istio…
- …differ in their approach to overlay networks, direct routing, etc.
Reachability
Basic Terminology for networks:
- Underlay network — Physical infrastructure
- Enables IP package forwarding…
- …cables, switches, routers
- OSI transport layer works as transition layer
- Overlay network — Software-driven transportation
- …abstracts low-level details for traffic forwarding
- …overlay implements virtual networks
- …create multiple logical networks over the underlay
Connection between nodes depends on underlying layer 2 network…
- Shared — Nodes share a layer 2 network
- …connectivity by static routes or full mesh
- Decoupled — Nodes connected to different layer 2 networks
- …encapsulation in the overlay (e.g. VXLAN)
- …orchestrating the underlay (e.g. BGP)
Implementation
Reference implementation for CNI plugins1 include…
bridge
create a bridge network an attaches podsvlan
allocates a VLAN devicehost-device
attach to an existing host deviceptp
create a virtualeth
pair
CNI plugin implmentations…
- Kindnet2
- Reachability …one static route per peer node
- Connectivity …mix of reference CNI plugins
- …
ptp
to createveth
links - …
host-local
to allocate IPs - …
portmap
for port mapping - …
kindnetd
daemon generates configuration files
- …
- Flannel
- Reachability …managed by
flanneld
daemon- …generates a
host-local
IPAM configuration - …creates a VXLAN interface
flannel.1
- …discovers VXLAN information from other nodes
- …builds a local unicast head-end replication (HER) table
- …generates a
- Connectivity …generates a
bridge
- Reachability …managed by
- Calico3
- Connectivity…
- …creates
veth
link - …setup host-route pointing to
veth
link - …egress link setup with
proxy_arp
- …creates
- Reachability…
- Static route & overlay mode …supports VXLAN
- BGP mode …BGP speaker on every node
- Connectivity…
- Cilium
- Connectivity…
- …creates a
veth
link - …eBPF program performs traffic forwarding
- …creates a
- Reachability…
- …
tunnel
mode …VXLAN interfaces to forward traffic - …
native-routing
mode …provided by underlay …static routes or BGP
- …
- Connectivity…
Network Policies
Network policies …filter traffic from/to pods
- …labels & selectors specify which policy applies to a pod
- …define and manage security policies for network communication
- Default pod communication within cluster not secured…
- …if a cluster is not using network policies
- …pods by default do not filter incoming traffic
- …no firewall rules for inter-pod communication
IP Addresses
Pods have a unique IP from a PodCIDR
range…
- …CIDR ranged assigned to a node during kubelet configuration
- …node are not aware of CIDRs assigned to other nodes
Non-overlapping IP addresses for pods, services & nodes
>>> kubectl get configmaps -n kube-system kubeadm-config -o yaml | grep -i subnet
podSubnet: 10.244.0.0/16
serviceSubnet: 10.96.0.0/16
# list per node IP address range
kubectl get nodes -o jsonpath='{range.items[*]}{.metadata.name}{"\t"}{.spec.podCIDR}{"\n"}'
# show pod with IP addresses
kubectl get pod -o wide
# show only the IP address
kubectl get pod $pod_name --template '{{.status.podIP}}'
DNS Service
Kubernetes cluster automatically provides a DNS service…
- Assigns readable names…
- …lightweight mechanism for service discovery
- …in addition to the pod IP address assignment
- …ephemeral IP addresses are not reliable endpoints for communication
- …service consumers should avoid using IP address
- Kubernetes DNS services4 for pods…
- …have at least one corresponding A/AAAA DNS record
- …format depends on the type of the serivce
- …some service may have SRV and PTR records
CoreDNS
CoreDNS implements the Kubernetes DNS spec5:
- …compiled to a static binary …deployed into the Kubernetes cluster
- Service discovery…
- Server-side
- …exposed as a
ClusterIP
service - …DNS service inside cluster …based on network forwarding rules
- …stores Kubernetes service, pods and endpoint objects
- …acts as DNS proxy for all internal domains
- …exposed as a
- Client-side
- …controlled by
spec.dnsPolicy
6 (per-pod basis) - …by default Kubelet configures cluster DNS IP in
resolv.conf
- …internal DNS has precedents over eternal DNS
- …controlled by
- Server-side
- Kubernetes nodes can run a local DNS cache7
Example
Setup an example in the default
namespace…
# Create a website deployment…
kubectl create deployment website --replicas=3 --image=httpd
# …and a service
kubectl expose deployment website --port=80
# Start a client…
kubectl run -it client --image busybox
# …later clean up
kubectl delete pod/client service/website deploy/website
Find the IP address of the cluster DNS server…
>>> kubectl get service/kube-dns -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 178m
DNS resolution in a pod…
10.96.0.10
is the address of the DNS server- Default internal domain name for a cluster is
cluster.local
- …subdomain per namespace
$namespace.svc.cluster.local
- …service for example
website.default.svc.cluster.local.
- …subdomain per namespace
# Linux resolver configuration in a pod
>>> cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local localdomain
nameserver 10.96.0.10
options ndots:5
Send a request by referencing the service name
>>> wget -qO - website
<html><body><h1>It works!</h1></body></html>
>>> wget -qO - website.default
<html><body><h1>It works!</h1></body></html>
>>> wget -qO - website.foo
wget: bad address 'website.foo'
External DNS
Required to discover…
- …external resources use from the Kubernetes cluster
- …external load-balancing service
- …ingress and gateway services
Two options to integrate external DNS resolution…
Services Abstraction
Multi-pod service abstraction groups similar pods & load-balance traffic to them
# list all services in a cluster
kubectl get service
# create a load-balancer service object
kubectl expose $object $name --type=LoadBalancer --name=$name
# remove a service object
kubectl delete service $name
Why using a service?
- Pod groups
- …all pods with a similar label represent a service
- …incoming traffic is load-balanced to all pods in a service
- Service exposure …either cluster internal and/or external
- Route external connections
- …clients do not need to know individual pods
- …single constant IP-address for a service
Overview
Kubernetes provides several services to facilitate external traffic into a cluster…
- Headless …simplest load-balancing (round-robin) by DNS
- ExternalName …access to a service by external DNS name
- ClusterIP …default for for internal communications
- NodePort …exposes a service on a static port on each node’s IP
- …makes the service accessible outside of the cluster
- …most basic way to perform external-to-service networking
- LoadBalancer …standard for external-service networking
- …assigns service to a public IP address
- …external load balancer is then directed to the backend pod
- Ingress …collection of routing rules surrounding external access to services
ClusterIP
ClusterIP
— Reserve a static virtual IP address
- …internal IP …reachable only within the cluster
- …maintains the security boundaries of the cluster
- Pod-to-service communication…
- …used for internal communications between pods and services
- …traffic load-balanced within the cluster
Example
Create some pods via a deployment
>>> kubectl create deployment website --replicas=3 --image=httpd
# all pods have the `app=website` label
>>> kubectl get pod --show-labels
NAME READY STATUS RESTARTS AGE LABELS
website-5d755d9996-2h4c2 1/1 Running 0 3m1s app=website,pod-template-hash=5d755d9996
website-5d755d9996-2vpdm 1/1 Running 0 3m1s app=website,pod-template-hash=5d755d9996
website-5d755d9996-fck9z 1/1 Running 0 3m1s app=website,pod-template-hash=5d755d9996
Create a ClusterIP service
>>> cat > service.yaml <<EOF
apiVersion: v1
kind: Service
metadata:
name: website
spec:
ports:
- port: 80
name: http
selector:
app: website
EOF
>>> kubectl apply -f service.yaml
# list the service including IP (only an internal IP)
>>> kubectl get service website
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
website ClusterIP 10.96.194.111 <none> 80/TCP 2m11s
# IP address allocated from a predefined range
>>> kubectl cluster-info dump | grep -m 1 service-cluster-ip-range
"--service-cluster-ip-range=10.96.0.0/16"
>>> kubectl get endpointslice | grep ^website
website-5x9x7 IPv4 80 10.244.2.89,10.244.1.133,10.244.1.139 8m41s
Query the ClusterIP from a client pod:
# Start a client pod…
>>> kubectl run -it client --image busybox
# …send a GET request to the ClusterIP
>> wget -qO - http://10.96.194.111
<html><body><h1>It works!</h1></body></html>
Check the service logs for the rerquest
kubectl logs -l app=website | grep GET
Observer changes after:
# modify the replication scale
kubectl scale deployment website --replicas=2
# remove a pod
kubectl delete pod website-$name
Clean up…
kubectl delete pod/client service/website deploy/website
Node Port
NodePort
— Forward traffic to specific port
- Accessible from outside the cluster via node IP address
- Each node forwards traffic to a specific port
Example
Nodeport service manifest…
service.yaml
apiVersion: v1
kind: Service
metadata:
name: website
spec:
type: NodePort
ports:
- port: 80
nodePort: 30080
name: http
selector:
app: website
kubectl apply -f service.yaml
kubectl create deployment website --replicas=3 --image=httpd
# later clean up
kubectl delete deployment/website service/website
Verify by connecting to the node port:
# identify the workers hosting the pods
>>> kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
website-5d755d9996-4dcr2 1/1 Running 0 112s 10.244.2.45 delta-worker2 <none> <none>
website-5d755d9996-jnd8z 1/1 Running 0 112s 10.244.1.37 delta-worker <none> <none>
website-5d755d9996-tcswb 1/1 Running 0 112s 10.244.1.157 delta-worker <none> <none>
# select one of the nodes and get the node IP-address
>>> kubectl get nodes delta-worker -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
delta-worker Ready <none> 28m v1.33.1 172.18.0.4 <none> Debian GNU/Linux 12 (bookworm) 6.14.6-200.fc41.x86_64 containerd://2.1.1
# send a GET request to the node port
>>> curl -s 172.18.0.4:30080
<html><body><h1>It works!</h1></body></html>
Load Balancer
Requires an external load balancer with public IP
- Accessible from outside via load balancer IP address
- For production …distributes traffic over nodes
apiVersion: v1
kind: Service
metadata:
name: my-service
spec:
type: LoadBalancer
selector:
app: my-app # identify pods for this service
ports:
- port: 8080 # service port
targetPort: 8080 # foward to container port
Configure session affinity for a service…
- …clients are redirected to the same port every time
- …defaults to
None
if not specified
spec:
sessionsAffinity: ClientIP
Pods backing a service can not see the actual client IP-address…
- …packets source IP changed for cluster internal routing
- …SNAT (Source Network Access Translation) performed on each package
Orchestration
Kubernetes orchestration of external load-balancer depends on environment:
Cloud-based cluster…
- Network Load Balancer (NLB) for Amazon Elastic Kubernetes Service (EKS)
- Standard Load Balancer for Azure Kubernetes Services (AKE)
- Cloud Load Balancer for Google Kubernetes Engine (GKE)
- LBaaS plugin for Openstack
- NSX ALB for VMWare
- …in-cluster component called
cloud-controller-manager
On-prem cluster…
- …existing load-balancer appliances
- …direct interaction with the physical network by cluster add-ons
- ARP (for L2 integration)
- BGB (for L3 integration)
- Kubernetes hosted load-balancers…
- MetalLB
- ARP and BGP modes …custom user-space implementation
- Configured with
ConfigMap
and custom CRD based operator
- OpenELB (part of Kubesphere)
- ARB and BGP modes
- Configuration via CRDs
- Kube-vip
- ARB and BGP modes
- Configured via flags, environment variables and
ConfigMap
- ServiceLB10 (aka Klipper, integrated with K3s)
- Exposes
LoadBalancer
as host ports on all cluster noeds
- Exposes
- MetalLB
Ingress
Ingress — Associate a URL with a backend service
- Operates on the application layer of the network stack (HTTP)
- Works in conjunction with Kubernetes Services and Endpoints
- Why use ingress?
- Path- & Host-based routing …typically an URL path
- Multiple services can share a single IP-address
- Manage SSL certificates and terminate SSL connections
- Manage authentication and authorization
- Terminology…
- Ingress Resource — Routing rules directing external traffic to a services
- Ingress Controller — Implements the rules defined in the ingress resource
- Backend Services — Services receiving the traffic directed by Ingress
Reverse Proxy
Ingress serves as a reverse proxy
- …intermediary between client and server that forwards requests
- Ingress Controllers are pods running within the Kubernetes cluster…
- …enforcing the rules set in the Ingress resources
- Control flow of inbound requests and direct them to the appropriate service
Ingress Rules
Ingress resources contain one or more Ingress rules
- Component of the Ingress resource that specifies the actual routing logic
- Each Ingress rule specifies a set of conditions (like host and path) and the corresponding backend service to which the traffic should be directed
- Path-Based Routing
- Routing directs traffic based on the URL path
example.org/hello
&example.org/world
redirect to different services
- Host-Based Routing
- Routing traffic bases on the hostname (or domain)
hello.example.org
&world.example.org
redirect to different services
Port Forward
Connect to a specific pod without going through a service…
- …typically for debugging & testing individual pods
- …notation is local port, colon followed by port in the pod
# forward a local network port to a port in the pod
kubectl port-forward $pod_name 30080:80 # local:remote
# select a specific container
kubectl port-forward $pod_name 38080:8080 -c $container_name
# multiple ports
kubectl port-forward $pod_name 30080:80,30443:443
SSH tunneling to a node with accesses to a Kubernetes cluster
ssh -L 38080:localhost:38080 $user@$node
kubectl port-forward $pod_name 38080:8080
Use proxy management like kubefwd
14
Footnotes
CNI Network Plugins
https://github.com/containernetworking/plugins↩︎Kindnet, CNI Plugin
https://kindnet.es
https://github.com/aojea/kindnet↩︎Calico, CNI Plugin
https://docs.tigera.io↩︎DNS for Services and Pods, Kubernetes Documentation
https://kubernetes.io/docs/concepts/services-networking/dns-pod-service↩︎Kubernetes DNS, CoreDNS Documentation
https://coredns.io/plugins/kubernetes↩︎Pod’s DNS Policy, Kubernetes Documentation
https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pod-s-dns-policy↩︎Using NodeLocal DNSCache in Kubernetes Clusters
https://kubernetes.io/docs/tasks/administer-cluster/nodelocaldns↩︎ExternalDNS, GitHub
https://github.com/kubernetes-sigs/external-dns↩︎k8s_gateway, GitHub
https://github.com/ori-edge/k8s_gateway↩︎ServiceLB, GitHub
https://github.com/k3s-io/klipper-lb↩︎NGINX Ingress Controller
https://docs.nginx.com/nginx-ingress-controller↩︎Traefik Proxy
https://doc.traefik.io/traefik↩︎HAProxy Ingress Controller
https://www.haproxy.com/documentation/kubernetes-ingress↩︎kubefwd
Project, GitHub
https://github.com/txn2/kubefwd↩︎