LearnwithVishnu
LearnwithVishnu
Basics → Production → Architect
← Home
🐍Python for DevOps
BeginnerIntermediateAdvancedAutomation scripts — subprocess, APIs, boto3, Kubernetes SDK, production patterns
Why Python?subprocessHTTP APIsYAML & JSONboto3 & AzureKubernetes SDKProduction PatternsInterview Q&ARoadmap

🐍 Why Python for DevOps?

Python is the glue that holds the DevOps world together

Every major DevOps tool — AWS, Kubernetes, Jenkins, GitHub, Terraform — exposes either a Python SDK or a REST API. Shell scripts are great for simple tasks but break down fast when you need to parse JSON, handle errors gracefully, retry on failure, or make HTTP calls. Python handles all of this cleanly.

What Python actually does in DevOps

TaskWithout PythonWith Python
Check deployment healthManual kubectl + grep in BashPython polls K8s API, retries, alerts Slack on fail
Upload build artifactAWS CLI in shell scriptboto3 with retry, versioning, metadata
Trigger downstream jobsJenkins UI clickPython calls Jenkins REST API from pipeline
Parse Terraform outputjq in shell (complex)json.loads() in 2 lines
Post-deploy smoke testcurl in loopPython with retry, timeout, proper error handling
Scale down dev at nightManualCron calls Python → K8s SDK → scales to 0
💡 Key InsightYou don't need to be a Python developer. You need to write clean automation scripts confidently. Knowing subprocess, requests, boto3, and the kubernetes SDK covers 90% of DevOps Python use cases.
Python for DevOps — overview and essential libraries

⚙️ subprocess — Run Shell Commands

The most used Python module in DevOps

subprocess.run() lets you run any shell command and capture the output. This is how Python scripts wrap kubectl, helm, terraform, docker — any CLI tool you use.

🧠 Memory Tip — subprocess vs os.systemos.system() runs the command but you cannot read the output. subprocess.run() with capture_output=True gives you stdout, stderr, and return code. Always use subprocess.run().

The pattern you use every time

result = subprocess.run(["cmd", "arg1", "arg2"], capture_output=True, text=True, timeout=30)
if result.returncode != 0:
    print(result.stderr)  # Show what went wrong
    sys.exit(1)
print(result.stdout)      # Use the output
subprocess — kubectl, terraform, docker, helm

🌐 HTTP APIs — Talking to DevOps Tools

Every DevOps tool has a REST API

Jenkins, GitHub, Kubernetes, PagerDuty, Slack — they all expose HTTP endpoints. Python's requests library lets you call them in a few lines. This is how you build integrations between tools that have no native connection.

Real Use Case — HPEAfter a Helm deployment to OpenShift, a Python script called the Jenkins API to retrieve the build number, called the GitHub API to create a release tag, and posted to Slack with the deployment summary — all in one 50-line script replacing what previously required three manual steps by three different people.
Jenkins, GitHub, Kubernetes, Slack, PagerDuty APIs

📄 YAML & JSON — The Infrastructure Language

All infrastructure is defined in YAML or JSON

Kubernetes manifests, Helm values, Ansible playbooks, GitHub Actions workflows — everything is YAML. Python reads and writes YAML in a few lines. This is how you build dynamic config generation, environment promotion scripts, and manifest patching tools.

⚠️ Watch outAlways use yaml.safe_load() — never yaml.load(). The load() function can execute arbitrary code if given malicious input. safe_load() only loads data structures.

☁️ Cloud SDKs — boto3 & Azure

AWS boto3 — the most important DevOps library

boto3 lets Python control every AWS service. EC2, S3, EKS, CloudWatch, SSM — all available as Python objects. Authentication uses the IAM Role attached to the EC2 instance or K8s pod automatically. Never hardcode AWS credentials in Python code.

🧠 Memory Tip — boto3 authenticationOn EC2 or EKS: boto3 automatically uses the IAM Role. Locally: uses ~/.aws/credentials. In CI/CD: set AWS_ACCESS_KEY_ID env variable. Priority: Role > env vars > credentials file.
boto3 — EC2, S3, EKS, CloudWatch + Azure SDK

☸️ Kubernetes SDK — Manage K8s from Python

The kubernetes Python library is a full K8s API client

Everything you can do with kubectl, you can do with the Python SDK — and more. Read pod status, wait for deployments, scale replicas, read ConfigMaps, find crashing pods. This is used in CI/CD post-deploy verification scripts and cost-saving automation.

K8s SDK — list pods, wait for deploy, scale, smoke test

🏗️ Production Script Patterns

Templates every DevOps engineer should memorise

Every production automation script needs the same structure: proper logging (not print), read config from environment variables (not hardcoded), retry logic for flaky operations, and notification on failure. This template is your starting point for every script you write.

Pattern Use at HPEAll automation scripts at HPE followed this pattern. When a Terraform apply failed overnight, the retry decorator caught it three times, then the Slack notification woke the on-call engineer with the exact error. Without the pattern: silent failure discovered next morning.
Production script template — logging, retry, env vars, Slack

🎯 Interview Questions

PYTHON DEVOPS · ENGINEER
What is subprocess.run() and why do you use it instead of os.system()?
subprocess.run() executes a shell command and returns a CompletedProcess object with stdout, stderr, and returncode. os.system() runs the command but gives you no output — you only know the exit code. In DevOps scripts, capturing output is essential: you need to parse kubectl get pods output, read terraform output JSON, check helm status. Always use subprocess.run() with capture_output=True, text=True. Set a timeout so scripts never hang indefinitely. Check returncode != 0 and exit or raise an exception — never silently continue after a command failure.
PYTHON DEVOPS · ENGINEER
How does boto3 authenticate to AWS? Why should you never hardcode credentials?
boto3 follows a credential chain: first it checks if there's an IAM Role attached (EC2 instance profile, EKS pod identity via IRSA, Lambda execution role). If not, it reads environment variables AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY. If not, it reads ~/.aws/credentials file. In production on EC2 or EKS, the IAM Role is attached automatically — no credentials are stored anywhere. Never hardcode credentials because: they end up in Git history forever even after deletion, anyone with repo access can see them, rotating them requires code changes. Use IAM Roles for production, environment variables for CI/CD, credentials file only for local development.
PYTHON DEVOPS · ADVANCED
How do you write a Python script that monitors Kubernetes pod health and alerts Slack when pods are crashing?
The pattern: kubernetes SDK to poll pod status, retry decorator for network reliability, Slack webhook for notifications. Start with config.load_incluster_config() in a pod or load_kube_config() locally. List pods with v1.list_pod_for_all_namespaces(). For each pod's container_statuses, check state.waiting.reason == CrashLoopBackOff. If found, build a message and post to Slack webhook using requests.post(). Run this as a Kubernetes CronJob every 5 minutes. Add a cooldown mechanism — track which pods you already alerted on to avoid spam. At HPE: this exact script ran as a CronJob and caught three CrashLoopBackOff pods during a deployment, alerting the team before users noticed.

🗺️ Learning Roadmap

Week 1
Foundation
Python basics: variables, loops, functions, dicts
subprocess.run() — wrap any CLI command
Read/write files, parse JSON
Write a script that runs kubectl and parses output
Week 2
APIs & Cloud
requests library — call Jenkins API, post to Slack
boto3 basics — list EC2, upload S3
YAML read/write — modify K8s manifests
Error handling: try/except, proper logging
Week 3
K8s Automation
kubernetes SDK — list pods, check status
Post-deploy verification script
Health check with retry and Slack alert
Scale down dev namespace at night (cost saving)
Month 2
Production Scripts
Production script template: logging, retry, env vars
Full deploy script: helm + smoke test + slack notify
Certificate expiry checker across all namespaces
Cost optimiser: find idle resources, scale/delete
Continue Learning
☸️ Kubernetes🟠 AWS⚙️ Jenkins🏠 All Topics
🤖
AI Assistant
Ask anything about this topic
👋 Hi! I have read this page and can answer your questions.

Try asking: "Explain this topic in simple terms" or "Give me an example" or ask any specific question.