Docker Complete Guide — LearnwithVishnu

Docker

Beginner Engineer Production Architect Build, ship, run containers — from basics to production hardening

What you will learn: What Docker is → Dockerfile best practices → Multi-stage builds → All commands → Troubleshooting (OOMKilled, restart loops, exit codes) → Security hardening → Trivy scanning → Docker Compose → CI/CD integration → 10 senior interview Q&As

What is Docker Dockerfile Commands Troubleshoot Security Registry Compose CI/CD Interview Q&A Roadmap

🐳 What is Docker?

›

Docker is a platform for building, shipping, and running applications in containers. A container is a lightweight, isolated process that packages the application with everything it needs — code, runtime, libraries, config. The key insight: works on my machine = works everywhere. Before Docker, "it works on my machine" was the most common phrase in engineering. Docker killed that problem.

Container vs Virtual Machine

	Container	Virtual Machine
Startup time	Milliseconds	Minutes
Size	MB	GB
Isolation	Process-level (kernel shared)	Full OS isolation
Performance	Near-native	5-15% overhead
Use case	Microservices, CI/CD	Full OS isolation needed

Core Docker Concepts

Image — read-only template with layers. Built from Dockerfile. Stored in registry.
Container — running instance of an image. Ephemeral by default (data lost on stop).
Dockerfile — instructions to build an image. Each instruction = one layer.
Registry — stores images. Docker Hub (public), ECR/ACR/Harbor (private).
Layer cache — unchanged layers are reused. Order of instructions matters for build speed.

Docker basics

📄 Dockerfile — Best Practices

›

The Dockerfile is the most important Docker artifact. A bad Dockerfile creates large, slow, insecure images. A good Dockerfile creates small, fast, secure images. Interviewers always ask about Dockerfile best practices — this is where junior and senior engineers differ.

The Most Common Mistake — Wrong Layer Order

Docker caches layers. If layer N changes, all layers N+1 onwards are rebuilt. Dependencies change rarely. Code changes every commit. Always copy dependencies before code.

BAD vs GOOD Dockerfile

Multi-Stage Builds — Production Standard

Multi-stage builds use multiple FROM instructions. The final image only contains what the last stage copies. Result: a 1.2GB build image becomes a 50MB production image.

Multi-stage build — production grade

.dockerignore — Always Create This

.dockerignore

🖥️ Docker Commands — Complete Reference

›

Images — build, pull, push, inspect

Containers — run, exec, logs, inspect

Networking and volumes

🔍 Troubleshooting — Real Production Issues

›

Exit Codes — What They Mean

Exit Code	Meaning	What to do
`0`	Clean exit — app stopped itself	Check if this was expected
`1`	Application error / unhandled exception	Check application logs
`137`	OOMKilled — memory limit exceeded (kill -9)	Increase memory limit or fix memory leak
`139`	Segmentation fault — app crashed	Check for null pointer, buffer overflow
`143`	SIGTERM received — graceful shutdown	Normal — container stopped gracefully
`125`	Docker daemon error	Check docker daemon logs
`126`	Command not executable (permission)	Check file permissions in container
`127`	Command not found	Check CMD/ENTRYPOINT, image contents

Troubleshooting container issues

Container keeps restarting — systematic approach

Debug restart loop

🔒 Docker Security — Production Hardening

›

Docker security is one of the most asked topics at senior interviews. Most breaches come from: running as root, using untrusted base images, exposed Docker socket, leaked secrets in image layers.

Security Checklist

Risk	Impact	Fix
Running as root	Container escape → host compromise	`USER 1000:1000` in Dockerfile
:latest tag	Unpredictable builds, supply chain risk	Always pin exact version: `python:3.11.4-slim`
Secrets in ENV	Visible in `docker inspect`, image history	Use Docker secrets or mount at runtime
Secrets in image layers	Even if deleted, in previous layer	Multi-stage build, never ADD secrets to image
Exposed Docker socket	Full host access = root on host	Never mount /var/run/docker.sock in production
Unscanned images	Known CVEs in production	Trivy scan in CI, block CRITICAL/HIGH
No resource limits	One container kills the host	Always set `--memory` and `--cpus`

Security hardening commands

Trivy — scan for vulnerabilities

📦 Registry — Push, Pull, Private Registries

›

Registry Options

Registry	Type	Best for
Docker Hub	Public/private	Open source, personal projects
AWS ECR	Private	AWS-based workloads
Azure ACR	Private	Azure/AKS workloads
GCP Artifact Registry	Private	GCP/GKE workloads
Harbor	Self-hosted	On-premise, air-gapped, vulnerability scanning
JFrog Artifactory	Self-hosted/cloud	Enterprise, all artifact types

Working with registries

🔧 Docker Compose — Local Development

›

Docker Compose defines multi-container applications. It is NOT for production (use Kubernetes for that) but essential for local development and integration testing. Every senior engineer should be able to write a Compose file from scratch.

docker-compose.yml — production-like local environment

⚡ Docker in CI/CD Pipelines

›

In production pipelines Docker builds must be: fast (layer caching), secure (no secrets in image), small (multi-stage), and tagged properly (semantic versioning, never :latest in production).

GitHub Actions — optimised Docker build + push

🏗️ Multi-Stage Builds — Production-Grade Images

›

The problem with single-stage builds

A standard Dockerfile includes: the full SDK (JDK, Maven, Node.js), all build tools, source code, test files, and the final compiled artifact. Result: a 600MB image shipped to production containing tools that are never used at runtime. Security risk: larger attack surface. Cost: slower pulls, more registry storage.

Multi-stage build — Java application example

# Stage 1: BUILD — contains Maven, source code, all build tools
FROM maven:3.9-eclipse-temurin-17 AS builder
WORKDIR /app
COPY pom.xml .
# Download dependencies separately (cached if pom.xml unchanged)
RUN mvn dependency:go-offline -B
COPY src ./src
RUN mvn clean package -DskipTests

# Stage 2: RUNTIME — contains ONLY the JRE and the compiled JAR
FROM eclipse-temurin:17-jre-alpine AS runtime
WORKDIR /app
# Security: create non-root user
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
# Copy ONLY the artifact from build stage
COPY --from=builder /app/target/myapp-1.0.jar app.jar
USER appuser
EXPOSE 8080
ENTRYPOINT ["java", "-jar", "app.jar"]

Result: build image is 600MB (never pushed). Runtime image is 180MB (what runs in production). The runtime image has no Maven, no source code, no compiler.

Multi-stage for Node.js

FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production   # install only production deps

FROM node:20-alpine AS runtime
WORKDIR /app
RUN addgroup -S app && adduser -S app -G app
COPY --from=builder /app/node_modules ./node_modules
COPY --chown=app:app src ./src
COPY package.json .
USER app
EXPOSE 3000
CMD ["node", "src/index.js"]

Build cache optimisation — order matters

Docker caches each layer. If a layer's input hasn't changed, it uses the cache. Rule: put things that change infrequently (dependencies) BEFORE things that change frequently (source code). Always copy package.json BEFORE copying source code. If you copy source first, the dependency install never uses cache.

⚡ BuildKit — Modern Docker Build Features

›

BuildKit is the modern Docker build engine (default in Docker 23+)

# Enable BuildKit (older Docker versions)
export DOCKER_BUILDKIT=1

# Build with BuildKit — parallel stages, better caching
docker buildx build -t myapp:v1 .

# Cache mount — cache pip/npm/maven downloads between builds
FROM python:3.11-slim
# --mount=type=cache: this directory is cached between builds
# never included in the image layer
RUN --mount=type=cache,target=/root/.cache/pip     pip install -r requirements.txt

# Secret mount — pass secrets without including them in image layers
RUN --mount=type=secret,id=npm_token     NPM_TOKEN=$(cat /run/secrets/npm_token) npm install

# Build with secret
docker buildx build --secret id=npm_token,src=.npmrc -t myapp .

# Multi-platform build — build for linux/amd64 and linux/arm64
docker buildx build --platform linux/amd64,linux/arm64 -t myapp:v1 --push .

BuildKit advantages

Parallel stage execution — independent stages build simultaneously
Cache mounts — package manager caches persist between builds (much faster)
Secret mounts — secrets never appear in image layers or history
Multi-platform — build for ARM and AMD64 in one command
Inline cache — embed cache metadata in the image for CI/CD layer caching

🔍 Image Scanning and Security in Production

›

Every image must be scanned before production

Tool	How to use	What it finds
Trivy	`trivy image myapp:v1`	OS package CVEs, language dependency CVEs, misconfigurations
Snyk	Integrates with GitHub, CI/CD	CVEs + license issues + code secrets
Grype	`grype myapp:v1`	CVEs in OS and language packages
Docker Scout	Built into Docker Desktop	CVEs with remediation advice

Trivy in CI/CD pipeline — fail on HIGH/CRITICAL

# Jenkins/GitHub Actions step
trivy image   --exit-code 1 \          # fail the build if vulnerabilities found
  --severity HIGH,CRITICAL \  # only fail on HIGH or CRITICAL
  --ignore-unfixed \       # ignore CVEs with no fix available
  myapp:${BUILD_NUMBER}

# Output as SARIF for GitHub Security tab
trivy image --format sarif --output trivy-results.sarif myapp:v1

Security best practices — summary

Never run as root — always add a non-root user in Dockerfile: USER appuser
Use minimal base images — Alpine, distroless, or slim variants
No secrets in image layers — use BuildKit secret mounts or inject at runtime via env vars/K8s secrets
Pin base image versions — FROM node:20.11.0-alpine3.19 not FROM node:latest
Read-only filesystem — docker run --read-only or Kubernetes readOnlyRootFilesystem: true
Scan in registry — ACR, ECR, Harbor all support automatic image scanning on push

🎯 Interview Questions — Senior Level

›

DOCKER · ENGINEER

What is the difference between CMD and ENTRYPOINT in a Dockerfile?

Both define what runs when the container starts. The key difference: ENTRYPOINT is the executable, CMD provides default arguments to ENTRYPOINT. When you override CMD (docker run myimage custom-arg), ENTRYPOINT stays. When you override ENTRYPOINT (docker run --entrypoint sh myimage), CMD is ignored. Production rule: use ENTRYPOINT for the application, CMD for default flags. Example: ENTRYPOINT ["python", "-m", "gunicorn"] CMD ["--workers", "4", "app:app"]. In production you also must use exec form (square brackets) not shell form — shell form makes /bin/sh PID 1 and your app never receives SIGTERM, causing 10-second kill delays on every container stop.

DOCKER · ENGINEER

Explain Docker layer caching. How do you optimise it?

Docker builds images layer by layer. If a layer has not changed since the last build, Docker reuses the cached version — making subsequent builds fast. The critical rule: once a layer cache is invalidated, ALL layers after it rebuild. The mistake: COPY . . before pip install — every code change invalidates the pip install cache and rebuilds all dependencies. The fix: COPY requirements.txt . first, RUN pip install, then COPY . . — dependencies only reinstall when requirements.txt changes. At HPE this changed our CI build time from 8 minutes to 90 seconds. In CI pipelines use BuildKit with --cache-from to share cache between pipeline runs. Use docker image history to see which layers are largest and where cache is being broken.

DOCKER · PRODUCTION

Your container is OOMKilled every few hours. Walk me through how you find the cause and fix it.

Exit code 137 = OOMKilled = kernel killed the process because it exceeded memory limit. Systematic approach: Step 1 — confirm it is OOM: docker inspect container --format={{.State.OOMKilled}} returns true. Step 2 — check what the memory limit is and what the container was actually using before death: docker stats showed 490MB out of 512MB limit over several hours. Step 3 — is it a memory leak or just insufficient limit? Memory leak: usage grows monotonically and never decreases. Insufficient limit: usage stabilizes at a level above the limit. At HPE we had a Python service with a Kafka consumer — it was caching all messages in memory without eviction. Fix: profiled with memory_profiler, found unbounded cache dictionary, added LRU cache with maxsize=1000. For short-term: increased limit from 512m to 1g to stop the pages. For permanent fix: added Prometheus memory metric alert at 80% of limit to catch future leaks before they cause kills.

DOCKER · ARCHITECT

What is a multi-stage Docker build and why is it critical for production?

Multi-stage builds use multiple FROM instructions in one Dockerfile. Each stage starts fresh and can copy artifacts from previous stages. The final image only contains what you explicitly copy into the last stage. Why it matters: a Go application with its compiler toolchain is 1.2GB. The same app compiled and copied to scratch (empty base image) is 8MB. Smaller image = faster pull, faster startup, smaller attack surface, lower registry storage cost, better security (fewer installed packages = fewer CVEs). At HPE we reduced our Java microservice images from 680MB (JDK) to 180MB (JRE only). The build stage has all the tools. The runtime stage has only the compiled binary and its runtime dependencies. I also use multi-stage for separating test execution — run tests in stage 1, if they fail the build fails before producing any image, so you can never push a tested-failed image.

DOCKER · ENGINEER

How do Docker networking modes differ? When do you use each?

Bridge (default): containers get their own network namespace, communicate via Docker-managed bridge network. Containers see each other by IP or container name on custom networks. Use for most applications. Host: container shares the host network stack directly — no port mapping needed, performance is slightly better, but zero network isolation. Use for network monitoring tools or when you need absolute maximum throughput. None: container has no network interface. Use for batch jobs that should never make network calls. Overlay: cross-host networking for Docker Swarm. In Kubernetes, Docker networking is replaced entirely by CNI plugins (Calico, Cilium, Flannel) — kube-proxy handles Service networking at the node level, not Docker. Important: create a custom bridge network instead of using the default bridge — custom networks provide automatic DNS resolution by container name. Default bridge uses IP addresses only.

DOCKER · PRODUCTION

A new Docker image is 2GB. Your CTO asks you to reduce it. How do you approach this?

Systematic reduction approach: Step 1 — analyse with docker image history to see which layers are biggest. Step 2 — switch to smaller base image: ubuntu (77MB) → debian-slim (30MB) → alpine (5MB) → distroless (2MB) → scratch (0). Caveat: alpine uses musl libc, some Python packages need glibc, test before committing. Step 3 — multi-stage build: separate build tools from runtime. A Maven/JDK build image is 500MB, but only the JAR needs to go to production, use JRE not JDK. Step 4 — clean up in same RUN layer: RUN apt-get update && apt-get install -y curl && rm -rf /var/lib/apt/lists/* — the cleanup must be in the same RUN, otherwise the package cache is already committed to a layer. Step 5 — .dockerignore to exclude tests, docs, .git. At HPE: a Node.js service went from 1.4GB (with node_modules) to 120MB (multi-stage + alpine + only production dependencies).

DOCKER · ARCHITECT

How do you handle secrets in Docker? What are the risks of using ENV for secrets?

ENV variables for secrets have three problems: First, docker inspect my-container shows all environment variables in plaintext to anyone with Docker access. Second, docker history myimage shows all build-time ENV instructions. Third, child processes inherit all environment variables — if the app spawns subprocesses, secrets leak there too. Production approaches in order of security: Runtime injection via orchestrator (Kubernetes Secrets, Docker Swarm secrets) — secret never in image, injected at runtime as file mount. External secrets manager (HashiCorp Vault, AWS Secrets Manager) — app fetches secret at startup using IAM role. BuildKit secret mount for build-time secrets (pip private index tokens etc.) — RUN --mount=type=secret,id=pip_token pip install... — secret is never committed to any layer. Never do: ENV DB_PASSWORD=secret in Dockerfile (baked in, visible in docker history forever even if later changed).

DOCKER · PRODUCTION

Production container is running slowly. How do you diagnose it?

Performance diagnosis framework: Step 1 — is it CPU, memory, IO, or network? docker stats shows live CPU%, memory usage, network IO, block IO. Step 2 — if CPU high: docker exec -it container top or docker exec container ps aux. Look for a process consuming 100% CPU — could be a tight loop, or the app is CPU-bound and needs more replicas. Step 3 — if memory high: close to limit = risk of OOMKill. Check for memory leak: watch docker stats over 30 minutes. If memory grows continuously = leak. Step 4 — if disk IO high: use iostat inside container or host-level iotop. If writing too many logs = use log rotation. Step 5 — add profiling: for Python use py-spy or memory_profiler, for Java use async-profiler, for Go use pprof. In production I always have resource limits set AND Prometheus container metrics (container_cpu_usage_seconds_total, container_memory_usage_bytes) to correlate performance issues with recent deployments or traffic patterns.

DOCKER · ENGINEER

What is the difference between COPY and ADD in Dockerfile?

Use COPY for everything. ADD does everything COPY does plus two extra features: it auto-extracts tar files and it can fetch URLs. These extra features are the reason NOT to use ADD — they make Dockerfiles unpredictable. ADD https://example.com/file.tar.gz /app automatically downloads and extracts, which is a security risk (downloading from external URLs in builds). COPY is explicit and predictable. The only valid use case for ADD is if you specifically need tar extraction in a single layer — but even then, I prefer COPY + RUN tar xzf. Interviewers ask this because ADD is in many old tutorials and its use signals that someone learned Docker from outdated sources.

DOCKER · ARCHITECT

How do you manage Docker images in a large organisation with 50+ microservices?

Image management strategy: Naming convention: registry.company.com/team/service:version — never just service:latest. Tagging strategy: semantic versioning (v1.2.3) + git SHA (sha-abc1234) + latest for latest main branch. Never deploy :latest to production. Base image governance: define approved base images (python:3.11-slim, openjdk:21-slim, node:20-alpine). Run weekly Trivy scans on all approved bases. When a base image CVE is found, trigger rebuild of all services using it via a dependency graph in CI. Registry: private registry (ECR or Harbor) with image scanning on push, vulnerability policy that blocks CRITICAL CVEs. Retention policy: keep last 10 tags per service, delete everything older. At HPE we had Harbor with automated Trivy scanning — any image with a CRITICAL CVE was automatically quarantined and developers notified within 5 minutes of push.

DOCKER · ENGINEER

What is a multi-stage Docker build and why is it important for production?

Multi-stage builds use multiple FROM instructions in one Dockerfile, where each FROM starts a new stage. The key feature: you copy only the output from one stage into the next, leaving everything else behind. A Java application without multi-stage: one image with JDK + Maven + source code + tests + compiled JAR = 600MB. With multi-stage: Stage 1 uses maven:3.9 to compile. Stage 2 uses eclipse-temurin:17-jre-alpine and only COPY --from=builder the compiled JAR. Final image: 180MB with no Maven, no source code, no test files. Why it matters for production: security (smaller attack surface — no compiler or build tools to exploit), performance (faster image pulls, faster pod startup, less registry storage), compliance (many security policies require minimal runtime images). Important optimisation inside each stage: copy dependency files (pom.xml, package.json) before source code. Docker caches each layer — if pom.xml hasn't changed, it skips the dependency download step on the next build. Source code changes every commit but dependencies change rarely. This cache trick reduces build time from 5 minutes to 30 seconds for most builds.

DOCKER · ENGINEER

Explain Docker networking — bridge, host, overlay, and when to use each.

Bridge network (default): each container gets a private IP in a virtual network (172.17.0.0/16 range). Containers on the same bridge can communicate by container name (if using user-defined bridge). Host network: container shares the host's network stack — no isolation, uses host IP and ports directly. Use for performance-critical applications where network overhead matters, or when the app needs to bind specific host ports. Custom user-defined bridge: docker network create mynetwork. Containers on the same user-defined bridge can resolve each other by name — myapp can reach database:5432. Default bridge doesn't have DNS resolution between containers. Always use custom networks, not the default bridge. Overlay network: multi-host networking for Docker Swarm or when containers on different hosts need to communicate. Uses VXLAN encapsulation. For Kubernetes: containers in the same Pod share a network namespace (same localhost). Between Pods: handled by the CNI plugin (Flannel, Calico, Azure CNI) which ensures every pod has a routable IP. Docker Compose automatically creates a user-defined bridge network for all services in the same compose file — services can reach each other by service name: redis://redis:6379 works because Compose creates DNS entries per service name.

DOCKER · PRODUCTION

A container is running but the application inside is not responding. How do you debug?

Systematic debugging steps. Step 1: check container status and recent logs. docker ps shows the container is running (Up). docker logs container-name --tail 100 shows application output — check for startup errors, exceptions, missing config. docker logs --follow to watch in real time. Step 2: check resource usage. docker stats container-name shows CPU, memory, network, disk I/O. Is the container at memory limit (potential OOM kill incoming)? High CPU from runaway loop? Step 3: get a shell inside. docker exec -it container-name sh (or bash if available). Inside: check if the process is running: ps aux. Check if it is listening on the expected port: netstat -tlnp or ss -tlnp. Try curling the endpoint from inside: curl localhost:8080/health. Step 4: check container events. docker inspect container-name shows restart count, exit code of previous runs. docker events --filter container=container-name shows all lifecycle events. Step 5: network connectivity. From another container or host: docker exec -it other-container curl http://target-container:8080. Check if the service port is actually exposed: docker port container-name. Step 6: if the container exits immediately on startup: docker run --entrypoint sh myimage -c "sleep 3600" to override ENTRYPOINT and keep it running for investigation.

DOCKER · ARCHITECT

How does Docker Compose work for a multi-service application? What are the key features?

Docker Compose defines and runs multi-container applications from a single YAML file. The compose file defines: services (each becomes a container), networks (how services communicate), volumes (persistent data). Key features for production-like local environments: depends_on with condition: service_healthy ensures the database is healthy before the app starts — prevents the classic "app starts before DB is ready" problem. Health checks define how Docker determines if a service is healthy: healthcheck: test: ["CMD", "pg_isready"] interval: 10s retries: 5. Profiles: mark services with profiles: [tools] — they only start when docker compose --profile tools up is run. Useful for optional debug tools. Environment variable files: env_file: .env.development separates config from the compose file. Override files: docker-compose.yml (base) + docker-compose.override.yml (local dev overrides like volume mounts for live code reload). Production compose: add resource limits (cpus: "0.5", memory: 512m), restart policies (restart: unless-stopped), and use secrets instead of environment variables for sensitive values. Docker Compose is ideal for local development and integration testing. For production Kubernetes: convert compose to Helm charts using Kompose (kompose convert) as a starting point.

🗺️ Learning Roadmap

›

Week 1

Foundations

Install Docker Desktop

Run first container: docker run nginx

Understand images vs containers

Write first Dockerfile

Week 2

Build & Optimise

Layer caching and order

Multi-stage builds

.dockerignore

Reduce image size below 100MB

Week 3

Networking & Storage

Bridge, host, overlay networks

Volumes and bind mounts

Docker Compose multi-container

Container-to-container communication

Week 4

Security & Production

Non-root user in Dockerfile

Trivy image scanning

Docker in CI/CD pipeline

Private registry setup

Month 2

K8s Ready

Understand why K8s replaces Docker in production

containerd vs Docker

OCI image spec

BuildKit advanced features

Continue Learning

☸️ Kubernetes 🔷 Terraform 🔧 Jenkins 🏗️ Harbor Registry 🏠 All Topics