Skip to main content
Kubernetes Pragmatic

Networking Primitives That Bite

Ravinder··7 min read
KubernetesCloud NativeDevOpsNetworkingCNINetworkPolicy
Share:
Networking Primitives That Bite

Kubernetes networking has a reputation for being opaque. That reputation is earned. The abstractions are stacked — Pod network, Service network, Ingress, NetworkPolicy, CNI — and when something breaks, the failure message is usually "connection refused" or a timeout, which tells you almost nothing about which layer is the problem.

I have spent more time than I care to admit staring at kubectl exec -- curl output trying to determine whether a service has no endpoints, the CNI has a route misconfiguration, or a NetworkPolicy is silently dropping packets. This post is about building enough of a mental model that you can get to the right layer fast.

The Four Layers and What They Actually Do

graph TB subgraph External["External Traffic"] LB[Load Balancer / Ingress] end subgraph Cluster["Cluster Network"] SVC[Service - ClusterIP / NodePort / LoadBalancer] NP[NetworkPolicy - enforced by CNI] POD1[Pod A] POD2[Pod B] POD3[Pod C] end LB --> SVC SVC --> NP NP -->|Allowed| POD1 NP -->|Allowed| POD2 NP -->|Blocked| POD3

Pod network: Every pod gets an IP from a CIDR managed by the CNI. Pods can talk to each other directly — but only if no NetworkPolicy is in the way.

Services: A stable virtual IP (ClusterIP) in front of a selector-matched set of pods. kube-proxy (or eBPF, depending on your CNI) programs iptables or BPF maps to DNAT traffic from the ClusterIP to a real pod IP. The Service IP itself never exists on any network interface.

Ingress: An HTTP(S) routing rule. It requires an Ingress controller (Nginx, AWS LB Controller, Traefik) to actually function. An Ingress object without a controller is a YAML document that does nothing.

NetworkPolicy: A firewall rule enforced by the CNI. With no NetworkPolicy, all pods can reach all other pods. Once you apply the first NetworkPolicy to a pod, it becomes deny-by-default for that direction.

Service Types: When to Use What

# ClusterIP — internal only, the default
# Use for: service-to-service within the cluster
apiVersion: v1
kind: Service
metadata:
  name: payments-api
  namespace: payments
spec:
  selector:
    app: payments-api
  ports:
    - port: 80
      targetPort: 8080
  type: ClusterIP  # Accessible only within cluster
 
---
# NodePort — exposes on every node's IP at a static port
# Use for: dev/test environments, on-prem without a cloud LB
# Avoid in production — it binds to every node regardless of pod placement
apiVersion: v1
kind: Service
metadata:
  name: debug-service
spec:
  selector:
    app: debug-app
  ports:
    - port: 80
      targetPort: 8080
      nodePort: 30080
  type: NodePort
 
---
# LoadBalancer — provisions a cloud load balancer
# Use for: non-HTTP protocols (TCP/UDP) or when Ingress is not available
# Cost: one cloud LB per Service. With 20 services, this gets expensive.
apiVersion: v1
kind: Service
metadata:
  name: grpc-service
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
    service.beta.kubernetes.io/aws-load-balancer-scheme: "internal"
spec:
  selector:
    app: grpc-service
  ports:
    - port: 9090
      targetPort: 9090
      protocol: TCP
  type: LoadBalancer

The practical rule: use ClusterIP everywhere and put a single Ingress controller in front of HTTP traffic. Reserve LoadBalancer type for non-HTTP protocols that Ingress cannot route.

Ingress — Pick One Controller and Own It

Every major cloud provider has an Ingress controller. So does Nginx. So does Traefik. The mistake is running more than one in the same cluster without a clear reason.

# Nginx Ingress — straightforward HTTP routing
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-ingress
  namespace: production
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    cert-manager.io/cluster-issuer: letsencrypt-prod
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - api.example.com
      secretName: api-tls
  rules:
    - host: api.example.com
      http:
        paths:
          - path: /payments
            pathType: Prefix
            backend:
              service:
                name: payments-api
                port:
                  number: 80
          - path: /orders
            pathType: Prefix
            backend:
              service:
                name: orders-api
                port:
                  number: 80

The AWS Load Balancer Controller is the better choice on EKS if you are already paying for ALBs — it provisions target groups directly from pod IPs (bypassing kube-proxy) and integrates with WAF and ACM. The tradeoff: it is AWS-specific, so your Ingress manifests are not portable.

NetworkPolicy: The Part Everyone Ignores Until They Need It

The default Kubernetes behavior is "every pod can reach every other pod." That is fine for a prototype. It is not fine for a production cluster running payment services alongside internal tooling.

NetworkPolicy is the mechanism, but it only works if your CNI enforces it. Flannel does not. Calico, Cilium, and the AWS VPC CNI with network policy addon all do.

# Default deny-all ingress for the payments namespace
# Apply this first, then explicitly allow what you need
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-ingress
  namespace: payments
spec:
  podSelector: {}  # Matches all pods in the namespace
  policyTypes:
    - Ingress
 
---
# Allow ingress from orders namespace to payments-api on port 80
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-orders-to-payments
  namespace: payments
spec:
  podSelector:
    matchLabels:
      app: payments-api
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: orders
          podSelector:
            matchLabels:
              app: orders-api
      ports:
        - protocol: TCP
          port: 80

The from clause with both namespaceSelector and podSelector in the same list item means AND — it must be the orders namespace AND the orders-api pod. Two separate list items would mean OR. That distinction has caused more misconfigured policies than I can count.

CNI Selection: The Decision That Is Painful to Change

graph LR A{CNI Choice} --> B[Flannel] A --> C[Calico] A --> D[Cilium] A --> E[AWS VPC CNI] B --> F[Simple, no NetworkPolicy enforcement] C --> G[NetworkPolicy, BGP routing, mature] D --> H[eBPF, NetworkPolicy + L7, observability built-in] E --> I[Native VPC IPs, AWS-specific, NP add-on required]

The CNI decision matters because changing it later requires a cluster rebuild or a carefully coordinated rolling replacement. Choose based on your requirements upfront:

  • Cilium if you want eBPF-based networking, Hubble observability, and Kubernetes NetworkPolicy plus L7 policies. Higher operational overhead, but the built-in network visibility (Hubble UI) pays for itself when debugging.
  • Calico if you want mature NetworkPolicy enforcement, BGP routing for bare metal or on-prem, and a large ecosystem. More familiar to network engineers coming from traditional infrastructure.
  • AWS VPC CNI if you are on EKS and want pods to get native VPC IPs (which simplifies security groups and VPC flow logs). Add the network policy addon for NetworkPolicy enforcement. The pod IP density per node is limited by ENI attachment limits — watch for Too many pods errors on large node types.

Debugging Checklist

When connectivity breaks, work top to bottom:

# 1. Does the Service have endpoints?
kubectl get endpoints payments-api -n payments
 
# 2. Can a pod reach the ClusterIP directly?
kubectl exec -n payments debug-pod -- curl -v http://10.100.0.50:80/health
 
# 3. Can a pod resolve the Service DNS?
kubectl exec -n payments debug-pod -- nslookup payments-api.payments.svc.cluster.local
 
# 4. Is a NetworkPolicy blocking it?
# Check for policies in both source and destination namespaces
kubectl get networkpolicy -n payments
kubectl get networkpolicy -n orders
 
# 5. Is the Ingress controller running and has the Ingress been admitted?
kubectl describe ingress api-ingress -n production
kubectl logs -n ingress-nginx deploy/ingress-nginx-controller --tail=50

Key Takeaways

  • The four Kubernetes networking layers — Pod network, Services, Ingress, NetworkPolicy — are independent and stack on each other. Knowing which layer broke is 80% of the debug work.
  • Use ClusterIP for internal service-to-service traffic and a single shared Ingress controller for HTTP/HTTPS. Reserve LoadBalancer type for non-HTTP protocols.
  • An Ingress resource without an installed controller does nothing. This is a common gotcha for teams new to the ecosystem.
  • NetworkPolicy is deny-by-default only for pods that have at least one policy applied to them. A namespace with no policies has no isolation.
  • The from clause with namespaceSelector and podSelector in the same list item is AND logic. Separate list items are OR. Get this wrong and you either over-block or under-block.
  • Your CNI choice determines whether NetworkPolicy is enforced at all. Flannel does not enforce it. Changing CNIs after the fact is a cluster rebuild.
Share: