☸️ What is Kubernetes?
›Kubernetes (K8s) is an open-source container orchestration platform originally built by Google, donated to CNCF in 2014. It automates deployment, scaling, and management of containerised applications across a cluster of machines.
The Problem Kubernetes Solves
Before Kubernetes: you had containers running on servers but no way to manage them at scale. If a container crashed, someone had to manually restart it. If traffic spiked, someone had to manually add servers. Deployments required downtime. Kubernetes solves all of this automatically.
| Without Kubernetes | With Kubernetes |
|---|---|
| Manual restarts when containers crash | Automatic self-healing |
| Manual scaling when traffic increases | Auto-scaling (HPA) |
| Downtime during deployments | Zero-downtime rolling updates |
| Hard to move workloads between servers | Workloads run anywhere |
| No standard way to manage config/secrets | ConfigMaps and Secrets built-in |
Key Concepts
- Declarative model — you describe WHAT you want, K8s figures out HOW
- Desired state — K8s continuously compares actual state vs desired state and reconciles
- Self-healing — crashed pods restart, failed nodes get workloads moved away
- Portability — same YAML works on AWS EKS, Azure AKS, GCP GKE, on-premise
🏗️ Architecture Deep Dive
›Kubernetes has two planes: Control Plane (the brain — makes decisions) and Data Plane (the muscle — executes decisions). Every component communicates through the API Server — nothing talks directly.
Control Plane Components
| Component | What it does | Where it runs |
|---|---|---|
| kube-apiserver | Single entry point for ALL operations. Validates, authenticates, stores to etcd. | Master node(s) |
| etcd | Distributed key-value store. Stores entire cluster state. BACK THIS UP. | Master node(s) |
| kube-scheduler | Watches for unscheduled pods, picks the best node based on resources + constraints. | Master node(s) |
| kube-controller-manager | Runs all controllers: Deployment, ReplicaSet, Node, Job, etc. Reconciles desired state. | Master node(s) |
| cloud-controller-manager | Talks to cloud provider APIs (create LoadBalancer, attach EBS volume, etc.) | Master node(s) |
Worker Node Components
| Component | What it does |
|---|---|
| kubelet | Agent on every node. Watches for pods assigned to its node, starts/stops containers via CRI. |
| kube-proxy | Manages network rules on each node. Implements Services using iptables or IPVS rules. |
| Container Runtime | Actually runs containers. containerd (default), CRI-O. Docker Engine no longer supported in K8s 1.24+. |
Request Flow — what happens when you run kubectl apply
- kubectl serialises YAML → sends HTTPS request to kube-apiserver
- apiserver: authenticates (who are you?) → authorises (RBAC check) → admission controllers (validate + mutate)
- apiserver writes desired state to etcd
- Deployment controller sees new Deployment → creates ReplicaSet
- ReplicaSet controller sees no pods exist → creates Pod objects in etcd
- Scheduler sees unscheduled pods → picks best node → writes nodeName to pod spec
- Kubelet on that node sees pod assigned to it → pulls image → starts container via containerd
- Container starts → readiness probe passes → kubelet reports Ready → kube-proxy adds pod to Service endpoints
⚙️ Installation & Setup
›Local Development
For learning and development — run Kubernetes on your laptop:
Production — Managed Kubernetes (Recommended)
| Provider | Service | Best for |
|---|---|---|
| AWS | EKS (Elastic Kubernetes Service) | AWS-heavy teams, IAM integration |
| Azure | AKS (Azure Kubernetes Service) | Microsoft/enterprise teams |
| GCP | GKE (Google Kubernetes Engine) | Best managed K8s, Autopilot mode |
| Red Hat | OpenShift OCP | Enterprise, regulated industries |
🖥️ kubectl — Complete Command Reference
›kubectl is the command-line tool to interact with Kubernetes. You must know these commands for any DevOps role.
📦 Workloads — Deployment, StatefulSet, DaemonSet
›Deployment — for stateless applications
Use for: web servers, APIs, microservices — anything that does not need stable identity or persistent storage.
StatefulSet — for stateful applications
Use for: databases (PostgreSQL, MySQL, MongoDB), Kafka, Elasticsearch, Redis Cluster. Key differences from Deployment:
- Pods get stable names:
postgres-0,postgres-1,postgres-2 - Ordered startup and shutdown (postgres-0 starts before postgres-1)
- Each pod gets its own PVC that survives pod restarts and rescheduling
- Stable DNS:
postgres-0.postgres-svc.namespace.svc.cluster.local
DaemonSet — one pod per node
Use for: log collectors (Fluentd, Filebeat), monitoring agents (node-exporter), network plugins (Calico, Cilium).
🔧 ConfigMap & Secrets
›ConfigMap — non-sensitive configuration
Store app config, feature flags, config files. Never put passwords or API keys in ConfigMaps.
Secrets — sensitive data
Important: Kubernetes Secrets are base64-encoded, NOT encrypted by default. For production you must either:
- Enable etcd encryption at rest (encrypt data in etcd)
- Use Sealed Secrets (bitnami) — encrypt secrets so they are safe to commit to Git
- Use External Secrets Operator — pull secrets from AWS Secrets Manager, Azure Key Vault, HashiCorp Vault
🌐 Networking — Services, Ingress, NetworkPolicy
›Service Types
| Type | Use Case | Accessible from |
|---|---|---|
| ClusterIP | Internal service-to-service communication | Inside cluster only |
| NodePort | Dev/testing, expose on node IP:port | Outside via node IP + port |
| LoadBalancer | Production external access (creates cloud LB) | Outside via cloud LB IP |
| ExternalName | DNS alias to external service | Inside cluster → external |
NetworkPolicy — pod-level firewall
💾 Storage — PV, PVC, StorageClass
›Storage Concepts
Containers are ephemeral — data is lost when pod dies. Kubernetes uses three resources:
- StorageClass — defines HOW storage is provisioned (AWS EBS, Azure Disk, NFS). The template.
- PersistentVolumeClaim (PVC) — a REQUEST for storage by a pod. Like asking for a disk.
- PersistentVolume (PV) — the actual storage resource. Created manually or dynamically by StorageClass.
🔒 Security — RBAC, ServiceAccounts, Pod Security
›RBAC Model
Every request in Kubernetes goes through: Authentication (who are you?) → Authorisation (RBAC: are you allowed?) → Admission Control (is this request valid?).
| Resource | Scope | Use for |
|---|---|---|
| Role | Namespace | Permissions within one namespace |
| ClusterRole | Cluster-wide | Node-level access, cross-namespace |
| RoleBinding | Namespace | Attach Role to user/ServiceAccount |
| ClusterRoleBinding | Cluster-wide | Attach ClusterRole to user/ServiceAccount |
| ServiceAccount | Namespace | Identity for pods (not humans) |
Production Security Checklist
- ✅ Never use
defaultServiceAccount — it accumulates permissions - ✅ No
cluster-adminfor application pods - ✅ Enable Pod Security Standards:
restrictednamespace label - ✅ No root containers in production
- ✅ Read-only root filesystem where possible
- ✅ Encrypt etcd at rest
- ✅ Rotate ServiceAccount tokens (TokenRequest API, not static tokens)
- ✅ Audit logging enabled on API server
- ✅ NetworkPolicy: default-deny, explicit allow
📈 Scaling — HPA, VPA, Cluster Autoscaler
›Why Scaling Matters — The Real Problem
Without scaling, you have two bad choices: provision for peak traffic (expensive, wasteful) or provision for average traffic (crashes during spikes). Kubernetes solves this with three autoscaling mechanisms that work at different levels.
Three Types of Autoscaling — When to Use Each
| Autoscaler | What it does | When to use | What it changes |
|---|---|---|---|
| HPA — Horizontal Pod Autoscaler | Adds or removes PODS | Stateless apps with variable traffic (web servers, APIs) | replica count |
| VPA — Vertical Pod Autoscaler | Adjusts CPU/memory of existing pods | When you don't know the right resource requests | resource requests/limits |
| Cluster Autoscaler | Adds or removes NODES | When pods cannot schedule due to insufficient node capacity | number of nodes in cluster |
| KEDA — Event-Driven Autoscaler | Scale on any metric or event | Kafka lag, queue depth, custom Prometheus metrics, scale to zero | replica count (extends HPA) |
HPA — Horizontal Pod Autoscaler
What it is: HPA watches a metric (CPU, memory, custom) and changes the number of pod replicas to maintain that metric at a target value.
How it works: Every 15 seconds (default), the HPA controller reads the metric from the metrics-server, calculates the desired replica count using the formula: desiredReplicas = ceil(currentReplicas × (currentMetricValue / targetMetricValue)), and updates the Deployment replica count.
Example: You have 3 pods. Target CPU = 70%. Current average CPU = 140%. Desired = ceil(3 × 140/70) = ceil(6) = 6 pods. HPA scales up to 6.
What you MUST have for HPA to work:
- metrics-server installed in cluster (kubectl top pods must work)
- Resource requests set on the pod (HPA calculates % relative to requests)
- minReplicas and maxReplicas defined
Common mistake: Setting target CPU to 100% — pods are always at 100% before HPA triggers. Keep it at 60-70% so there is headroom while new pods start.
VPA — Vertical Pod Autoscaler
What it is: VPA analyses historical resource usage of pods and recommends (or automatically sets) CPU and memory requests/limits. If your app uses 300m CPU but you set requests to 100m, VPA detects this and adjusts.
Three VPA modes:
- Off — only gives recommendations, no changes. Use this first to understand what values to set.
- Initial — sets requests when the pod first starts, does not change running pods. Safe.
- Auto — can restart pods to apply new resource values. Causes brief downtime. Use with care.
Important limitation: VPA and HPA cannot both use CPU/memory on the same deployment. VPA changes requests, HPA calculates based on requests — they conflict. Use KEDA for custom metrics with HPA when you also need VPA.
Cluster Autoscaler
What it is: Watches for pods that are stuck in Pending state because no node has enough resources. Adds a new node from the cloud provider's node group. Also removes nodes that have been underutilised for 10+ minutes (saves cost).
How scale-up works: Pod stays Pending → CA sees it within 10 seconds → requests new node from cloud API → node ready in 2-3 minutes → pod scheduled.
How scale-down works: Node CPU/memory below 50% for 10 minutes AND all pods on it can be moved elsewhere → CA drains the node → terminates the VM.
What prevents scale-down: Pod with no PodDisruptionBudget, pod with local storage, pod with restrictive anti-affinity, system pods (kube-system). This is why PDBs matter — a badly configured PDB can prevent ALL scale-down.
KEDA — Scale on Any Metric
What it is: KEDA extends HPA to scale on events and metrics that HPA cannot natively handle — Kafka consumer lag, RabbitMQ queue length, AWS SQS message count, custom Prometheus queries, Azure Service Bus, and more. It can also scale to zero (HPA minimum is 1).
Real use case: You have a Kafka consumer processing messages. At peak, 100,000 messages queue up. You want to scale from 2 pods to 50 based on Kafka consumer group lag, not CPU. KEDA makes this possible with a ScaledObject resource.
Scaling Decision Guide
| Situation | Solution |
|---|---|
| API gets more traffic at peak hours | HPA on CPU or request rate |
| Don't know right resource requests for a new app | VPA in Off mode first, then Initial |
| Pods Pending because nodes are full | Cluster Autoscaler |
| Kafka consumer needs to scale on queue depth | KEDA with Kafka trigger |
| Batch job needs to scale to zero when no work | KEDA with scale-to-zero |
| Node costs too high during off-hours | Cluster Autoscaler + scheduled HPA scale-down |
🚀 Production Checklist
›Before you go live — every workload must have these:
| Item | Why | How |
|---|---|---|
| Resource requests + limits | Prevents noisy neighbour pod killing others | resources.requests + resources.limits |
| Liveness probe | Restart deadlocked pods automatically | livenessProbe.httpGet or exec |
| Readiness probe | Remove pod from LB when not ready | readinessProbe.httpGet or exec |
| PodDisruptionBudget | Survive node drain / K8s upgrades | minAvailable: 2 or maxUnavailable: 1 |
| Pod anti-affinity | Pods spread across nodes / AZs | podAntiAffinity.requiredDuringScheduling |
| Specific image tag | Reproducible deployments | image: myapp:v2.1.0 — NEVER :latest |
| Non-root user | Security — limit blast radius | securityContext.runAsNonRoot: true |
| HPA configured | Handle traffic spikes automatically | HorizontalPodAutoscaler resource |
🔍 Troubleshooting Guide
›Common Issues and Fixes
| Symptom | First command | Common cause |
|---|---|---|
Pod stuck in Pending | kubectl describe pod | Insufficient resources, node selector mismatch, PVC not bound |
Pod in CrashLoopBackOff | kubectl logs --previous | App crash on startup, missing secret/env var, OOMKilled |
Pod in ImagePullBackOff | kubectl describe pod | Wrong image name/tag, no imagePullSecret for private registry |
Pod stuck in Terminating | kubectl describe pod | Finalizer not removed, node offline |
| Service not reaching pods | kubectl describe svc + get endpoints | Label selector mismatch, pod not Ready |
| OOMKilled | kubectl describe pod | Memory limit too low, memory leak in app |
⚙️ Kubelet, Kubectl, Node, Pod, Container — Clearly Explained
›The company analogy — remember this for interviews
- API Server = reception desk — all requests go through here
- etcd = the filing cabinet — stores everything the cluster knows
- Scheduler = HR planner — decides which node gets each new pod
- Controller Manager = management — watches state, fixes problems
- kubelet = office manager on each node — receives instructions, runs containers
- kube-proxy = telephone switchboard — routes network traffic to correct pods
- kubectl = your phone to call reception — you send commands, API server receives them
The exact hierarchy
AZURE (Cloud)
└── VMSS (Virtual Machine Scale Set — the AKS node pool)
└── NODE (a VM running Kubernetes)
├── kubelet (process managing this node)
├── kube-proxy (handles pod networking)
└── POD (smallest K8s deployable unit — has its own IP)
├── Container 1: your-app (the actual application)
├── Container 2: sidecar (log collector, Istio proxy)
└── Shared: network namespace, volumes
kubelet — what it actually does
- Runs on every worker node. The link between control plane and the containers on that node.
- Watches API server: "Do I have new pods assigned to me?"
- Tells containerd/Docker to start containers
- Reports back: "Pod X is Running, Pod Y is CrashLoopBackOff"
- Runs liveness and readiness probes
- Manages volume mounts — PVCs, Secrets, CSI driver volumes
kubectl — common commands explained
kubectl get pods -n production # Asks API server → reads from etcd → returns list kubectl apply -f deployment.yaml # Sends YAML to API server → validates → stores in etcd # Scheduler assigns to node → kubelet starts container kubectl describe pod mypod # Full details including Events from kubelet and scheduler kubectl logs mypod # Goes to kubelet on the pod's node → returns container stdout/stderr kubectl exec -it mypod -- bash # Tunnel: API server → kubelet → container runtime → shell