LearnwithVishnu
LearnwithVishnu
Basics → Production → Architect
AI Active
✦ AI-Powered Knowledge Platform

From Basics to
Architect-Level Mastery

A structured universal learning ecosystem — DevOps, Cloud, Medical Coding, MIS, Telco, Human Essentials and more. Built for real-world production thinking.

500+
Topics Covered
10+
Learning Domains
31
Tools Covered
4
Mastery Levels
Learning Paths
☸️
DevOps & Platform Eng.
CI/CD, Kubernetes, OpenShift, Terraform, Ansible and monitoring at production scale.
JenkinsArgoCDK8sTerraform
☁️
Cloud Architecture
AWS, Azure, GCP — from fundamentals to multi-cloud architect design patterns.
AWSAzureGCP
📊
Monitoring & Observability
Prometheus, Grafana, ELK, Datadog, Loki, Splunk and SLO engineering.
PrometheusDatadogELK
🏥
Medical Coding
ICD-10, OASIS, PDGM — Home Health specialization with real scenarios.
ICD-10OASISPDGM
📈
MIS & Data Analytics
Excel, Power BI, SQL, Python/Pandas — from reporting to data engineering.
ExcelPower BIPython
🔒
DevSecOps
Vault, Trivy, SonarQube, Kyverno, Falco — security at every pipeline stage.
VaultTrivyKyverno
🌾
Natural & Organic Farming
Chemical-free farming from terrace garden to large farm — vegetables, rice, fruits, spices, dal, cotton.
CompostNeemJeevamrutSRI
Universal Learning Framework — every topic follows this
01
What is it?
Definition and core concept
02
Why it exists
Origin and problem it solves
03
Architecture
System design and components
04
Core Modules
Key parts and their roles
05
Setup & Config
Install and first run
06
Real Examples
Working code and configs
07
Production Usage
Patterns and best practices
08
Troubleshooting
Failures and how to fix
09
Monitoring
Metrics and observability
10
Security
Hardening and compliance
11
Roadmap
Beginner to architect path
12
Interview Prep
Questions and PSR answers
Quick Access
Kubernetes
OpenShift
Terraform
Ansible
Jenkins
ArgoCD
Docker
Helm
AWS
Azure
GCP
Prometheus
ELK Stack
Datadog
Loki+OTel
Splunk
Linux
Networking
DevSecOps
Kafka
GitHub Actions
GitLab CI
SLO/SLI
Python for MIS
HA/DR
Incidents
Azure DevOps
Istio
Multi-Cloud
FluxCD
Tekton
Argo Rollouts
Harbor
KEDA
eBPF/Cilium
Platform Eng
Organic Farming
Complete Learning Roadmap
Beginner → ArchitectAll domains — DevOps, Cloud, MIS, Medical, Agriculture
Jump to:
⚙️ DevOps & Platform Engineering Roadmap — 6 phases, 6-12 months
Phase 1
Foundations
Linux & Bash
Networking
Python for DevOps
Docker
Kubernetes
Phase 2
CI/CD & GitOps
Jenkins
GitHub Actions
GitLab CI
ArgoCD
Helm
Tekton
FluxCD
Phase 3
IaC & Security
Terraform
Ansible
OpenShift OCP
DevSecOps
Istio
Harbor/Registry
Phase 4
Cloud
AWS Core
Azure / AKS
GCP / GKE
Azure DevOps
Multi-Cloud
Phase 5
Observability & SRE
Prometheus
ELK Stack
Datadog
Loki+OTel
SLO/SLI
HA/DR
Incidents
Phase 6
Architect Level
KEDA
eBPF/Cilium
Argo Rollouts
Platform Engineering
Mock Interviews
☁️ Cloud Architect Roadmap — 3-6 months
Month 1
Fundamentals
Cloud Fundamentals
Networking
Linux Basics
Month 2-3
One Cloud Deep
AWS (recommended first)
OR Azure
OR GCP
Terraform IaC
Month 4-6
Multi-Cloud
Multi-Cloud Strategy
Azure DevOps
HA/DR Design
📊 MIS & Data Analytics Roadmap — 6 months to Data Engineer
Month 1-2
Excel & SQL
Excel (XLOOKUP, Power Query)
SQL Basics
Power BI Reports
Month 3-4
Python Basics
Python for MIS
NumPy Foundations
Pandas DataFrames
Month 5-6
Automation
Automated Reports
Plotly Dashboards
Power BI + DAX
Month 7+
Data Engineer
SQL Advanced
Apache Airflow
Azure Synapse / Databricks
🏥 Medical Coding Roadmap — 3-6 months to certification
Month 1
Foundations
ICD-10-CM Basics
Anatomy Overview
OASIS Start of Care
Month 2-3
Home Health
PDGM Groupings
LUPA Thresholds
HIPPS Codes
Month 4-6
Certification
Practice Cases
Audit Scenarios
CPC / HCS-D Exam
🌱 Organic Farming Roadmap — Start growing within 1 month
Week 1
Start Composting
Kitchen Waste Compost
Buy Vermicompost
Month 1
First Crops
Methi, Coriander
Chilli, Tomato
Neem Spray Prep
Month 2-3
Soil Building
Jeevamrut Method
Drip Setup
All-Season Planting
Year 1+
Field Transition
SRI Rice Method
Turmeric + Dal Crops
PGS Certification
Interview Prep — Category-wise Questions
PSR Formula: Problem → Solution → Result. Answer 45–90 seconds. Click any card to reveal the answer. Questions are organized by tool/domain.
KUBERNETES · ARCHITECT
How do you design a zero-downtime deployment strategy in Kubernetes?
Blue-Green: two Deployments with Service selector switch — instant rollback. Rolling Update: maxUnavailable=0, maxSurge=1 — gradual. For DB migrations: expand-and-contract pattern so both versions run simultaneously. Use ArgoCD Rollouts for automated canary with Prometheus analysis. Result: zero downtime deployments with automatic rollback on error rate breach.
KubernetesBlue-Green
KUBERNETES · ENGINEER
What is the difference between Liveness and Readiness probes?
Liveness: is the container alive? If it fails, K8s restarts the pod. Use for detecting deadlocks. Readiness: is the container ready to serve traffic? If it fails, pod is removed from Service endpoints but NOT restarted. Use for slow startup. Startup probe: replaces liveness during slow startup to prevent premature restarts. Rule: always configure both for production workloads.
KubernetesProbes
KUBERNETES · ARCHITECT
How does the Kubernetes scheduler work? What happens when you deploy a pod?
1. API Server stores pod spec in etcd. 2. Scheduler watches for unscheduled pods, scores nodes based on resources, affinity rules, taints/tolerations. 3. Kubelet on selected node starts the pod via container runtime. Scheduling decisions: filter phase removes ineligible nodes, score phase ranks remaining. PodDisruptionBudget ensures minimum availability during voluntary disruptions.
KubernetesScheduler
KUBERNETES · PRODUCTION
A pod is in CrashLoopBackOff. What is your systematic troubleshooting approach?
1. kubectl describe pod — check Events section for scheduling/pull errors. 2. kubectl logs --previous — see what crashed. 3. Check if app exits on missing env vars or secrets. 4. Check resource limits — OOMKilled means memory limit too low. 5. Check liveness probe — too aggressive probe restarts healthy pods. 6. kubectl exec -it — shell in if pod starts briefly. At HPE: 80% of CrashLoopBackOff are missing secrets or wrong image tag.
KubernetesDebug
CI/CD · ARCHITECT
How do you design a Jenkins pipeline supporting multiple environments without duplicating code?
Jenkins Shared Libraries. Create vars/ folder in a central Git repo with reusable Groovy functions. Each Jenkinsfile calls shared functions with environment-specific parameters. Branch mapping: feature→dev, release→staging, main→prod. Parameterize: app name, image tag, namespace. Result: 60% reduction in pipeline maintenance across 20+ microservices at HPE.
JenkinsShared Libraries
GITOPS · ARCHITECT
What is the difference between push-based and pull-based deployment?
Push: CI pipeline has credentials and pushes changes to cluster — credentials outside cluster, no drift detection. Pull: agent inside cluster (ArgoCD/FluxCD) watches Git and pulls desired state — credentials stay inside, drift is auto-corrected. GitOps = pull-based. Security posture significantly better: blast radius of a compromised CI is limited. ArgoCD detects and alerts on drift within 3 minutes.
GitOpsArgoCD
CI/CD · ENGINEER
How do you speed up a 45-minute CI pipeline?
1. Parallelise independent jobs: test + security scan + lint run simultaneously. 2. Cache dependencies: npm/pip cache saves 5-10 min. 3. Docker BuildKit layer caching: only rebuild changed layers. 4. Self-hosted agents: skip 3-5 min setup time. 5. Skip unchanged services in monorepo using git diff. Result: 45 min → 12 min at HPE by parallelising and adding BuildKit caching.
CI/CDPerformance
ARGO ROLLOUTS · ARCHITECT
How do you implement progressive delivery with automated rollback?
Argo Rollouts with AnalysisTemplate. Canary steps: 10% traffic → wait 5min → query Prometheus error rate → if below 1% proceed to 50% → wait → 100%. AnalysisTemplate queries: sum(rate(errors[2m]))/sum(rate(requests[2m])). If 3 consecutive failures above threshold → automatic rollback to stable version. Result: data-driven deployments, eliminated human decision at each canary step.
Argo RolloutsCanary
TERRAFORM · ARCHITECT
How do you manage Terraform state for a team of 10+ engineers?
Remote state in Azure Blob/S3 with DynamoDB locking. Separate state per environment per module — payment-service/staging and payment-service/prod are separate state files. Terraform workspaces for env separation. CI/CD runs Terraform — never local apply. State encryption at rest. Backend access controls per environment. Regular state backups. Never run terraform destroy in CI without manual approval gate.
TerraformState
TERRAFORM · ENGINEER
What is the difference between count and for_each in Terraform?
count: creates indexed list of resources — if you remove item from middle, all subsequent resources are recreated (index shifts). for_each: creates map of resources keyed by string — removing one item doesn't affect others. Rule: always use for_each for real resources, count only for enable/disable patterns. Example: for_each on a map of environments means removing staging doesn't touch prod.
TerraformHCL
PROMETHEUS · ARCHITECT
How do you design an alerting strategy that avoids alert fatigue?
Alert on symptoms not causes. Only 3 alert severities: P1 (page immediately), P2 (page business hours), P3 (ticket). Every P1 must be actionable — if on-call can't do anything about it, it shouldn't page. Multi-window burn rate alerts: fast burn (1h window) for emergency, slow burn (6h window) for warning. Silence noisy alerts and fix root cause. At HPE: reduced weekly alerts from 200 to 15 meaningful ones.
PrometheusAlerting
DATADOG · ARCHITECT
How would you migrate from Prometheus+Grafana to Datadog?
Phased migration. Phase 1: deploy Datadog Agent alongside Prometheus — dual collection. Phase 2: recreate critical dashboards in Datadog. Phase 3: recreate AlertManager rules as Datadog Monitors. Phase 4: cut over alerting to Datadog. Phase 5: decommission Prometheus. Concepts are identical — PromQL maps to Datadog metrics query, AlertManager maps to Monitors, Grafana panels map to Datadog widgets. Timeline: 4-6 weeks for 50-service environment.
DatadogMigration
SRE · ARCHITECT
How do you set SLOs for a critical payment service?
SLI: % requests returning non-5xx AND latency under 2s. SLO: 99.9% availability (8.7h budget/year). SLA: 99.5% (customer contract). Error budget: 43.2 min/month. Policy: below 10% budget remaining → freeze non-critical deployments, focus on reliability. Multi-window burn rate alerts: 14.4x burn for 1h window pages immediately, 6x burn for 6h window pages to review. Result: MTTR improved from 45min to 12min.
SLOError Budget
INCIDENT · P1
Production is down. Walk through your complete incident response.
0-5 min: Acknowledge → open #incident channel → post "investigating, first update in 10min". Assess: blast radius, check recent deployments. If deployment correlates: rollback immediately — don't investigate while users are down. Communicate every 10-15 min even if no fix yet. Mitigate first, root cause second. Post-mortem within 48h: blameless, 5-whys, action items with owners. Target: P1 MTTR under 30 minutes.
IncidentP1
AWS · ARCHITECT
How do you design a highly available 3-tier application on AWS?
Web tier: ALB across 3 AZs, EC2 ASG or ECS tasks. App tier: EKS cluster across 3 AZs, pod anti-affinity spreads pods. DB tier: RDS Multi-AZ (synchronous standby), Read Replicas for read scale. Cache: ElastiCache Redis cluster mode. CDN: CloudFront for static assets. DNS: Route53 health checks for failover. Key: assume AZ failure is normal — design for 2-AZ operation at all times.
AWSHA
MULTI-CLOUD · ARCHITECT
When should you NOT do multi-cloud?
When there is no clear business driver. Multi-cloud adds 40-60% operational complexity — team needs expertise in 2+ platforms, egress costs between clouds are significant, monitoring requires unified tooling. Valid reasons for multi-cloud: regulatory requirement, acquisition, best-of-breed (GCP BigQuery + Azure AKS). Invalid: "just in case" or "vendor lock-in fear" without a concrete plan. Start: master one cloud deeply before adding a second.
Multi-CloudArchitecture
DEVOPS · ARCHITECT
What is the difference between DevOps, SRE, and Platform Engineering?
DevOps: culture — break silos, developers own deployments, automate everything. SRE: Google's implementation of DevOps using software engineering — SLOs, error budgets, toil reduction, on-call. Platform Engineering: builds the Internal Developer Platform — golden paths, self-service, Backstage portal. In practice: I do all three — maintain SLOs (SRE), build shared Terraform modules and CI templates (Platform), work closely with developers to remove friction (DevOps).
DevOpsSREPlatform
OPENSHIFT · ARCHITECT
How does OpenShift differ from vanilla Kubernetes in a production enterprise?
OpenShift adds: SCCs (stricter than PSP/PSA), built-in Tekton pipelines, built-in registry, Routes (not just Ingress), MachineConfig for node configuration, OperatorHub for add-ons, integrated monitoring (Prometheus+Grafana pre-configured). Security default: no root containers, no privileged access — stricter than K8s defaults. Enterprise support from Red Hat. When to choose: regulated industries, teams that want everything integrated and supported.
OpenShiftEnterprise
DEVSECOPS · ARCHITECT
How do you implement shift-left security in a CI/CD pipeline?
4 layers: 1. Dev time — pre-commit hooks with Trivy/Checkov, IDE plugins. 2. CI — SAST (SonarQube), dependency scan (Snyk), IaC scan (Checkov). 3. Registry — Trivy scan on push, block images with CRITICAL CVEs. 4. Runtime — Falco for runtime anomalies, Kyverno for policy enforcement. Key metric: cost of fixing a vulnerability: $1 at dev time, $100 in CI, $1000 in staging, $10000 in production. Shift-left = find cheaply.
DevSecOpsShift-Left
ISTIO · ARCHITECT
How does mTLS in Istio improve security without code changes?
Istio CA issues SPIFFE certificates to every pod automatically. All service-to-service traffic is mTLS — encrypted AND authenticated — without any application code changes. STRICT mode rejects non-mTLS traffic. AuthorizationPolicy controls which service can call which using service account identity. Result: zero-trust networking where even a compromised pod inside the cluster cannot call arbitrary services. Certificate rotation automated every 24h.
IstiomTLS
💡 Click any topic in the sidebar to study its content, then come back to test yourself here
More interview questions are embedded inside each topic page under "Interview Prep" module
🤖
Vishnu AI Assistant
Ready
Explain simply
Quiz me
Interview Q
Real example
👋 Hi! Ask me anything about the topics you are studying — concepts, code, interview prep, or roadmap guidance.