LearnwithVishnu
LearnwithVishnu
Basics → Production → Architect
← Home
DockerDocker
Beginner Engineer Production Architect Build, ship, run containers — from basics to production hardening
What you will learn: What Docker is → Dockerfile best practices → Multi-stage builds → All commands → Troubleshooting (OOMKilled, restart loops, exit codes) → Security hardening → Trivy scanning → Docker Compose → CI/CD integration → 10 senior interview Q&As
What is Docker Dockerfile Commands Troubleshoot Security Registry Compose CI/CD Interview Q&A Roadmap

🐳 What is Docker?

Docker is a platform for building, shipping, and running applications in containers. A container is a lightweight, isolated process that packages the application with everything it needs — code, runtime, libraries, config. The key insight: works on my machine = works everywhere. Before Docker, "it works on my machine" was the most common phrase in engineering. Docker killed that problem.

Container vs Virtual Machine

ContainerVirtual Machine
Startup timeMillisecondsMinutes
SizeMBGB
IsolationProcess-level (kernel shared)Full OS isolation
PerformanceNear-native5-15% overhead
Use caseMicroservices, CI/CDFull OS isolation needed

Core Docker Concepts

  • Image — read-only template with layers. Built from Dockerfile. Stored in registry.
  • Container — running instance of an image. Ephemeral by default (data lost on stop).
  • Dockerfile — instructions to build an image. Each instruction = one layer.
  • Registry — stores images. Docker Hub (public), ECR/ACR/Harbor (private).
  • Layer cache — unchanged layers are reused. Order of instructions matters for build speed.
Docker basics

📄 Dockerfile — Best Practices

The Dockerfile is the most important Docker artifact. A bad Dockerfile creates large, slow, insecure images. A good Dockerfile creates small, fast, secure images. Interviewers always ask about Dockerfile best practices — this is where junior and senior engineers differ.

The Most Common Mistake — Wrong Layer Order

Docker caches layers. If layer N changes, all layers N+1 onwards are rebuilt. Dependencies change rarely. Code changes every commit. Always copy dependencies before code.

BAD vs GOOD Dockerfile

Multi-Stage Builds — Production Standard

Multi-stage builds use multiple FROM instructions. The final image only contains what the last stage copies. Result: a 1.2GB build image becomes a 50MB production image.

Multi-stage build — production grade

.dockerignore — Always Create This

.dockerignore

🖥️ Docker Commands — Complete Reference

Images — build, pull, push, inspect
Containers — run, exec, logs, inspect
Networking and volumes

🔍 Troubleshooting — Real Production Issues

Exit Codes — What They Mean

Exit CodeMeaningWhat to do
0Clean exit — app stopped itselfCheck if this was expected
1Application error / unhandled exceptionCheck application logs
137OOMKilled — memory limit exceeded (kill -9)Increase memory limit or fix memory leak
139Segmentation fault — app crashedCheck for null pointer, buffer overflow
143SIGTERM received — graceful shutdownNormal — container stopped gracefully
125Docker daemon errorCheck docker daemon logs
126Command not executable (permission)Check file permissions in container
127Command not foundCheck CMD/ENTRYPOINT, image contents
Troubleshooting container issues

Container keeps restarting — systematic approach

Debug restart loop

🔒 Docker Security — Production Hardening

Docker security is one of the most asked topics at senior interviews. Most breaches come from: running as root, using untrusted base images, exposed Docker socket, leaked secrets in image layers.

Security Checklist

RiskImpactFix
Running as rootContainer escape → host compromiseUSER 1000:1000 in Dockerfile
:latest tagUnpredictable builds, supply chain riskAlways pin exact version: python:3.11.4-slim
Secrets in ENVVisible in docker inspect, image historyUse Docker secrets or mount at runtime
Secrets in image layersEven if deleted, in previous layerMulti-stage build, never ADD secrets to image
Exposed Docker socketFull host access = root on hostNever mount /var/run/docker.sock in production
Unscanned imagesKnown CVEs in productionTrivy scan in CI, block CRITICAL/HIGH
No resource limitsOne container kills the hostAlways set --memory and --cpus
Security hardening commands
Trivy — scan for vulnerabilities

📦 Registry — Push, Pull, Private Registries

Registry Options

RegistryTypeBest for
Docker HubPublic/privateOpen source, personal projects
AWS ECRPrivateAWS-based workloads
Azure ACRPrivateAzure/AKS workloads
GCP Artifact RegistryPrivateGCP/GKE workloads
HarborSelf-hostedOn-premise, air-gapped, vulnerability scanning
JFrog ArtifactorySelf-hosted/cloudEnterprise, all artifact types
Working with registries

🔧 Docker Compose — Local Development

Docker Compose defines multi-container applications. It is NOT for production (use Kubernetes for that) but essential for local development and integration testing. Every senior engineer should be able to write a Compose file from scratch.

docker-compose.yml — production-like local environment

⚡ Docker in CI/CD Pipelines

In production pipelines Docker builds must be: fast (layer caching), secure (no secrets in image), small (multi-stage), and tagged properly (semantic versioning, never :latest in production).

GitHub Actions — optimised Docker build + push

🏗️ Multi-Stage Builds — Production-Grade Images

The problem with single-stage builds

A standard Dockerfile includes: the full SDK (JDK, Maven, Node.js), all build tools, source code, test files, and the final compiled artifact. Result: a 600MB image shipped to production containing tools that are never used at runtime. Security risk: larger attack surface. Cost: slower pulls, more registry storage.

Multi-stage build — Java application example

# Stage 1: BUILD — contains Maven, source code, all build tools
FROM maven:3.9-eclipse-temurin-17 AS builder
WORKDIR /app
COPY pom.xml .
# Download dependencies separately (cached if pom.xml unchanged)
RUN mvn dependency:go-offline -B
COPY src ./src
RUN mvn clean package -DskipTests

# Stage 2: RUNTIME — contains ONLY the JRE and the compiled JAR
FROM eclipse-temurin:17-jre-alpine AS runtime
WORKDIR /app
# Security: create non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
# Copy ONLY the artifact from build stage
COPY --from=builder /app/target/myapp-1.0.jar app.jar
USER appuser
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "app.jar"]

Result: build image is 600MB (never pushed). Runtime image is 180MB (what runs in production). The runtime image has no Maven, no source code, no compiler.

Multi-stage for Node.js

FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production   # install only production deps

FROM node:20-alpine AS runtime
WORKDIR /app
RUN addgroup -S app && adduser -S app -G app
COPY --from=builder /app/node_modules ./node_modules
COPY --chown=app:app src ./src
COPY package.json .
USER app
EXPOSE 3000
CMD ["node", "src/index.js"]

Build cache optimisation — order matters

Docker caches each layer. If a layer's input hasn't changed, it uses the cache. Rule: put things that change infrequently (dependencies) BEFORE things that change frequently (source code). Always copy package.json BEFORE copying source code. If you copy source first, the dependency install never uses cache.

⚡ BuildKit — Modern Docker Build Features

BuildKit is the modern Docker build engine (default in Docker 23+)

# Enable BuildKit (older Docker versions)
export DOCKER_BUILDKIT=1

# Build with BuildKit — parallel stages, better caching
docker buildx build -t myapp:v1 .

# Cache mount — cache pip/npm/maven downloads between builds
FROM python:3.11-slim
# --mount=type=cache: this directory is cached between builds
# never included in the image layer
RUN --mount=type=cache,target=/root/.cache/pip     pip install -r requirements.txt

# Secret mount — pass secrets without including them in image layers
RUN --mount=type=secret,id=npm_token     NPM_TOKEN=$(cat /run/secrets/npm_token) npm install

# Build with secret
docker buildx build --secret id=npm_token,src=.npmrc -t myapp .

# Multi-platform build — build for linux/amd64 and linux/arm64
docker buildx build --platform linux/amd64,linux/arm64 -t myapp:v1 --push .

BuildKit advantages

  • Parallel stage execution — independent stages build simultaneously
  • Cache mounts — package manager caches persist between builds (much faster)
  • Secret mounts — secrets never appear in image layers or history
  • Multi-platform — build for ARM and AMD64 in one command
  • Inline cache — embed cache metadata in the image for CI/CD layer caching

🔍 Image Scanning and Security in Production

Every image must be scanned before production

ToolHow to useWhat it finds
Trivytrivy image myapp:v1OS package CVEs, language dependency CVEs, misconfigurations
SnykIntegrates with GitHub, CI/CDCVEs + license issues + code secrets
Grypegrype myapp:v1CVEs in OS and language packages
Docker ScoutBuilt into Docker DesktopCVEs with remediation advice

Trivy in CI/CD pipeline — fail on HIGH/CRITICAL

# Jenkins/GitHub Actions step
trivy image   --exit-code 1 \          # fail the build if vulnerabilities found
  --severity HIGH,CRITICAL \  # only fail on HIGH or CRITICAL
  --ignore-unfixed \       # ignore CVEs with no fix available
  myapp:${BUILD_NUMBER}

# Output as SARIF for GitHub Security tab
trivy image --format sarif --output trivy-results.sarif myapp:v1

Security best practices — summary

  • Never run as root — always add a non-root user in Dockerfile: USER appuser
  • Use minimal base images — Alpine, distroless, or slim variants
  • No secrets in image layers — use BuildKit secret mounts or inject at runtime via env vars/K8s secrets
  • Pin base image versionsFROM node:20.11.0-alpine3.19 not FROM node:latest
  • Read-only filesystemdocker run --read-only or Kubernetes readOnlyRootFilesystem: true
  • Scan in registry — ACR, ECR, Harbor all support automatic image scanning on push

🎯 Interview Questions — Senior Level

DOCKER · ENGINEER
What is the difference between CMD and ENTRYPOINT in a Dockerfile?
Both define what runs when the container starts. The key difference: ENTRYPOINT is the executable, CMD provides default arguments to ENTRYPOINT. When you override CMD (docker run myimage custom-arg), ENTRYPOINT stays. When you override ENTRYPOINT (docker run --entrypoint sh myimage), CMD is ignored. Production rule: use ENTRYPOINT for the application, CMD for default flags. Example: ENTRYPOINT ["python", "-m", "gunicorn"] CMD ["--workers", "4", "app:app"]. In production you also must use exec form (square brackets) not shell form — shell form makes /bin/sh PID 1 and your app never receives SIGTERM, causing 10-second kill delays on every container stop.
DOCKER · ENGINEER
Explain Docker layer caching. How do you optimise it?
Docker builds images layer by layer. If a layer has not changed since the last build, Docker reuses the cached version — making subsequent builds fast. The critical rule: once a layer cache is invalidated, ALL layers after it rebuild. The mistake: COPY . . before pip install — every code change invalidates the pip install cache and rebuilds all dependencies. The fix: COPY requirements.txt . first, RUN pip install, then COPY . . — dependencies only reinstall when requirements.txt changes. At HPE this changed our CI build time from 8 minutes to 90 seconds. In CI pipelines use BuildKit with --cache-from to share cache between pipeline runs. Use docker image history to see which layers are largest and where cache is being broken.
DOCKER · PRODUCTION
Your container is OOMKilled every few hours. Walk me through how you find the cause and fix it.
Exit code 137 = OOMKilled = kernel killed the process because it exceeded memory limit. Systematic approach: Step 1 — confirm it is OOM: docker inspect container --format={{.State.OOMKilled}} returns true. Step 2 — check what the memory limit is and what the container was actually using before death: docker stats showed 490MB out of 512MB limit over several hours. Step 3 — is it a memory leak or just insufficient limit? Memory leak: usage grows monotonically and never decreases. Insufficient limit: usage stabilizes at a level above the limit. At HPE we had a Python service with a Kafka consumer — it was caching all messages in memory without eviction. Fix: profiled with memory_profiler, found unbounded cache dictionary, added LRU cache with maxsize=1000. For short-term: increased limit from 512m to 1g to stop the pages. For permanent fix: added Prometheus memory metric alert at 80% of limit to catch future leaks before they cause kills.
DOCKER · ARCHITECT
What is a multi-stage Docker build and why is it critical for production?
Multi-stage builds use multiple FROM instructions in one Dockerfile. Each stage starts fresh and can copy artifacts from previous stages. The final image only contains what you explicitly copy into the last stage. Why it matters: a Go application with its compiler toolchain is 1.2GB. The same app compiled and copied to scratch (empty base image) is 8MB. Smaller image = faster pull, faster startup, smaller attack surface, lower registry storage cost, better security (fewer installed packages = fewer CVEs). At HPE we reduced our Java microservice images from 680MB (JDK) to 180MB (JRE only). The build stage has all the tools. The runtime stage has only the compiled binary and its runtime dependencies. I also use multi-stage for separating test execution — run tests in stage 1, if they fail the build fails before producing any image, so you can never push a tested-failed image.
DOCKER · ENGINEER
How do Docker networking modes differ? When do you use each?
Bridge (default): containers get their own network namespace, communicate via Docker-managed bridge network. Containers see each other by IP or container name on custom networks. Use for most applications. Host: container shares the host network stack directly — no port mapping needed, performance is slightly better, but zero network isolation. Use for network monitoring tools or when you need absolute maximum throughput. None: container has no network interface. Use for batch jobs that should never make network calls. Overlay: cross-host networking for Docker Swarm. In Kubernetes, Docker networking is replaced entirely by CNI plugins (Calico, Cilium, Flannel) — kube-proxy handles Service networking at the node level, not Docker. Important: create a custom bridge network instead of using the default bridge — custom networks provide automatic DNS resolution by container name. Default bridge uses IP addresses only.
DOCKER · PRODUCTION
A new Docker image is 2GB. Your CTO asks you to reduce it. How do you approach this?
Systematic reduction approach: Step 1 — analyse with docker image history to see which layers are biggest. Step 2 — switch to smaller base image: ubuntu (77MB) → debian-slim (30MB) → alpine (5MB) → distroless (2MB) → scratch (0). Caveat: alpine uses musl libc, some Python packages need glibc, test before committing. Step 3 — multi-stage build: separate build tools from runtime. A Maven/JDK build image is 500MB, but only the JAR needs to go to production, use JRE not JDK. Step 4 — clean up in same RUN layer: RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/* — the cleanup must be in the same RUN, otherwise the package cache is already committed to a layer. Step 5 — .dockerignore to exclude tests, docs, .git. At HPE: a Node.js service went from 1.4GB (with node_modules) to 120MB (multi-stage + alpine + only production dependencies).
DOCKER · ARCHITECT
How do you handle secrets in Docker? What are the risks of using ENV for secrets?
ENV variables for secrets have three problems: First, docker inspect my-container shows all environment variables in plaintext to anyone with Docker access. Second, docker history myimage shows all build-time ENV instructions. Third, child processes inherit all environment variables — if the app spawns subprocesses, secrets leak there too. Production approaches in order of security: Runtime injection via orchestrator (Kubernetes Secrets, Docker Swarm secrets) — secret never in image, injected at runtime as file mount. External secrets manager (HashiCorp Vault, AWS Secrets Manager) — app fetches secret at startup using IAM role. BuildKit secret mount for build-time secrets (pip private index tokens etc.) — RUN --mount=type=secret,id=pip_token pip install... — secret is never committed to any layer. Never do: ENV DB_PASSWORD=secret in Dockerfile (baked in, visible in docker history forever even if later changed).
DOCKER · PRODUCTION
Production container is running slowly. How do you diagnose it?
Performance diagnosis framework: Step 1 — is it CPU, memory, IO, or network? docker stats shows live CPU%, memory usage, network IO, block IO. Step 2 — if CPU high: docker exec -it container top or docker exec container ps aux. Look for a process consuming 100% CPU — could be a tight loop, or the app is CPU-bound and needs more replicas. Step 3 — if memory high: close to limit = risk of OOMKill. Check for memory leak: watch docker stats over 30 minutes. If memory grows continuously = leak. Step 4 — if disk IO high: use iostat inside container or host-level iotop. If writing too many logs = use log rotation. Step 5 — add profiling: for Python use py-spy or memory_profiler, for Java use async-profiler, for Go use pprof. In production I always have resource limits set AND Prometheus container metrics (container_cpu_usage_seconds_total, container_memory_usage_bytes) to correlate performance issues with recent deployments or traffic patterns.
DOCKER · ENGINEER
What is the difference between COPY and ADD in Dockerfile?
Use COPY for everything. ADD does everything COPY does plus two extra features: it auto-extracts tar files and it can fetch URLs. These extra features are the reason NOT to use ADD — they make Dockerfiles unpredictable. ADD https://example.com/file.tar.gz /app automatically downloads and extracts, which is a security risk (downloading from external URLs in builds). COPY is explicit and predictable. The only valid use case for ADD is if you specifically need tar extraction in a single layer — but even then, I prefer COPY + RUN tar xzf. Interviewers ask this because ADD is in many old tutorials and its use signals that someone learned Docker from outdated sources.
DOCKER · ARCHITECT
How do you manage Docker images in a large organisation with 50+ microservices?
Image management strategy: Naming convention: registry.company.com/team/service:version — never just service:latest. Tagging strategy: semantic versioning (v1.2.3) + git SHA (sha-abc1234) + latest for latest main branch. Never deploy :latest to production. Base image governance: define approved base images (python:3.11-slim, openjdk:21-slim, node:20-alpine). Run weekly Trivy scans on all approved bases. When a base image CVE is found, trigger rebuild of all services using it via a dependency graph in CI. Registry: private registry (ECR or Harbor) with image scanning on push, vulnerability policy that blocks CRITICAL CVEs. Retention policy: keep last 10 tags per service, delete everything older. At HPE we had Harbor with automated Trivy scanning — any image with a CRITICAL CVE was automatically quarantined and developers notified within 5 minutes of push.
DOCKER · ENGINEER
What is a multi-stage Docker build and why is it important for production?
Multi-stage builds use multiple FROM instructions in one Dockerfile, where each FROM starts a new stage. The key feature: you copy only the output from one stage into the next, leaving everything else behind. A Java application without multi-stage: one image with JDK + Maven + source code + tests + compiled JAR = 600MB. With multi-stage: Stage 1 uses maven:3.9 to compile. Stage 2 uses eclipse-temurin:17-jre-alpine and only COPY --from=builder the compiled JAR. Final image: 180MB with no Maven, no source code, no test files. Why it matters for production: security (smaller attack surface — no compiler or build tools to exploit), performance (faster image pulls, faster pod startup, less registry storage), compliance (many security policies require minimal runtime images). Important optimisation inside each stage: copy dependency files (pom.xml, package.json) before source code. Docker caches each layer — if pom.xml hasn't changed, it skips the dependency download step on the next build. Source code changes every commit but dependencies change rarely. This cache trick reduces build time from 5 minutes to 30 seconds for most builds.
DOCKER · ENGINEER
Explain Docker networking — bridge, host, overlay, and when to use each.
Bridge network (default): each container gets a private IP in a virtual network (172.17.0.0/16 range). Containers on the same bridge can communicate by container name (if using user-defined bridge). Host network: container shares the host's network stack — no isolation, uses host IP and ports directly. Use for performance-critical applications where network overhead matters, or when the app needs to bind specific host ports. Custom user-defined bridge: docker network create mynetwork. Containers on the same user-defined bridge can resolve each other by name — myapp can reach database:5432. Default bridge doesn't have DNS resolution between containers. Always use custom networks, not the default bridge. Overlay network: multi-host networking for Docker Swarm or when containers on different hosts need to communicate. Uses VXLAN encapsulation. For Kubernetes: containers in the same Pod share a network namespace (same localhost). Between Pods: handled by the CNI plugin (Flannel, Calico, Azure CNI) which ensures every pod has a routable IP. Docker Compose automatically creates a user-defined bridge network for all services in the same compose file — services can reach each other by service name: redis://redis:6379 works because Compose creates DNS entries per service name.
DOCKER · PRODUCTION
A container is running but the application inside is not responding. How do you debug?
Systematic debugging steps. Step 1: check container status and recent logs. docker ps shows the container is running (Up). docker logs container-name --tail 100 shows application output — check for startup errors, exceptions, missing config. docker logs --follow to watch in real time. Step 2: check resource usage. docker stats container-name shows CPU, memory, network, disk I/O. Is the container at memory limit (potential OOM kill incoming)? High CPU from runaway loop? Step 3: get a shell inside. docker exec -it container-name sh (or bash if available). Inside: check if the process is running: ps aux. Check if it is listening on the expected port: netstat -tlnp or ss -tlnp. Try curling the endpoint from inside: curl localhost:8080/health. Step 4: check container events. docker inspect container-name shows restart count, exit code of previous runs. docker events --filter container=container-name shows all lifecycle events. Step 5: network connectivity. From another container or host: docker exec -it other-container curl http://target-container:8080. Check if the service port is actually exposed: docker port container-name. Step 6: if the container exits immediately on startup: docker run --entrypoint sh myimage -c "sleep 3600" to override ENTRYPOINT and keep it running for investigation.
DOCKER · ARCHITECT
How does Docker Compose work for a multi-service application? What are the key features?
Docker Compose defines and runs multi-container applications from a single YAML file. The compose file defines: services (each becomes a container), networks (how services communicate), volumes (persistent data). Key features for production-like local environments: depends_on with condition: service_healthy ensures the database is healthy before the app starts — prevents the classic "app starts before DB is ready" problem. Health checks define how Docker determines if a service is healthy: healthcheck: test: ["CMD", "pg_isready"] interval: 10s retries: 5. Profiles: mark services with profiles: [tools] — they only start when docker compose --profile tools up is run. Useful for optional debug tools. Environment variable files: env_file: .env.development separates config from the compose file. Override files: docker-compose.yml (base) + docker-compose.override.yml (local dev overrides like volume mounts for live code reload). Production compose: add resource limits (cpus: "0.5", memory: 512m), restart policies (restart: unless-stopped), and use secrets instead of environment variables for sensitive values. Docker Compose is ideal for local development and integration testing. For production Kubernetes: convert compose to Helm charts using Kompose (kompose convert) as a starting point.
id="sec-roadmap">

🗺️ Learning Roadmap

Week 1
Foundations
Install Docker Desktop
Run first container: docker run nginx
Understand images vs containers
Write first Dockerfile
Week 2
Build & Optimise
Layer caching and order
Multi-stage builds
.dockerignore
Reduce image size below 100MB
Week 3
Networking & Storage
Bridge, host, overlay networks
Volumes and bind mounts
Docker Compose multi-container
Container-to-container communication
Week 4
Security & Production
Non-root user in Dockerfile
Trivy image scanning
Docker in CI/CD pipeline
Private registry setup
Month 2
K8s Ready
Understand why K8s replaces Docker in production
containerd vs Docker
OCI image spec
BuildKit advanced features
Continue Learning
☸️ Kubernetes 🔷 Terraform 🔧 Jenkins 🏗️ Harbor Registry 🏠 All Topics
🤖
AI Assistant
Ask anything about this topic
👋 Hi! I have read this page and can answer your questions.

Try asking: "Explain this topic in simple terms" or "Give me an example" or ask any specific question.