LearnwithVishnu
Basics → Production → Architect
← Home
🏗️Platform Engineering
BeginnerEngineerArchitectInternal Developer Platforms, Backstage catalog, DORA metrics, golden paths
What is Platform EngBackstageDORA MetricsQ&A

🏗️ What is Platform Engineering?

What is Platform Engineering?

Platform Engineering is the discipline of building and operating Internal Developer Platforms (IDPs) — self-service products that enable development teams to deploy, manage, and operate their services without needing deep expertise in Kubernetes, Terraform, or CI/CD internals.

Instead of every team learning Kubernetes, writing their own Jenkinsfiles, setting up monitoring, and managing compliance — the platform team builds a product that gives every developer all of this through a simple interface. Developer fills a form, clicks Create, gets a fully working service with CI/CD, K8s deployment, monitoring, and runbook — without touching a single YAML file.

Platform Engineering vs traditional DevOps

Traditional DevOpsPlatform Engineering
Each team manages their own CI/CD, K8s config, monitoringPlatform team owns the infrastructure — devs consume it via self-service
Dev teams must be expert in K8s and TerraformDevelopers use templates and get K8s without knowing it
Ops team bottleneck for new environment provisioningSelf-service — new environment ready in minutes, no ticket
Different tools per team — inconsistent securityPaved road — standardised toolchain with built-in compliance
Difficult to measure impactDORA metrics — deployment frequency, lead time, MTTR

Why Platform Engineering is growing

DevOps culture broke the dev/ops wall — developers now own their services end-to-end. But this created a new problem: every developer must now be expert in Kubernetes, Terraform, CI/CD, security policies, and compliance. That is too much cognitive load for application developers. Platform Engineering solves this by treating the platform as a product — the platform team handles the complexity, developers get a simple interface to consume it.

🛤️ Golden Paths & Self-Service

Crossplane self-service + Tekton standard pipelines

🎭 Backstage — Building Your IDP

What is Backstage?

Backstage is an open-source Internal Developer Platform framework created by Spotify, donated to CNCF in 2020, and now the most widely-adopted IDP foundation. Companies including Spotify, Expedia, American Airlines, LinkedIn, and HPE use Backstage. It is not an IDP out of the box — it is a framework for building your IDP.

Four core capabilities

CapabilityWhat developers get
Software CatalogRegistry of all services, APIs, libraries, infrastructure. Search and find any service, see its owner, documentation, dependencies, health status, and recent incidents.
Software TemplatesClick-to-create new service. Fill a form: service name, team, language. Get: Git repo created, CI/CD pipeline configured, K8s manifests generated, monitoring dashboard pre-built, runbook template added.
TechDocsDocumentation-as-code. Docs in Markdown next to code in Git. Rendered and searchable in Backstage. Always up-to-date with the code.
PluginsIntegrate everything: Kubernetes (pod status), ArgoCD (sync status), GitHub (PRs, CI runs), PagerDuty (incidents), Grafana (dashboards), SonarQube (code quality).
# catalog-info.yaml — register any service in Backstage
# Commit this file to the root of any Git repo
apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: payment-service
  description: Handles payment processing for all transactions
  annotations:
    github.com/project-slug: company/payment-service
    backstage.io/techdocs-ref: dir:.
    prometheus.io/alert: "payment-service-slo"
spec:
  type: service
  lifecycle: production
  owner: payments-team
  system: payment-platform
  dependsOn:
    - component:order-service
    - resource:payment-database
  providesApis:
    - payment-api-v2

📊 DORA Metrics

DORA metrics — measuring DevOps performance

DORA (DevOps Research and Assessment) identified four key metrics that distinguish elite DevOps teams from low performers. These are the standard metrics platform teams track and report to leadership.

MetricWhat it measuresEliteHighMediumLow
Deployment FrequencyHow often you deploy to productionMultiple/dayDailyWeeklyMonthly
Lead Time for ChangesCommit to production time< 1 hour1 day1 week1 month
Change Failure Rate% deployments causing incidents0-5%5-10%10-15%15%+
MTTRTime to restore service after incident< 1 hour< 1 day< 1 week> 1 week

Platform team goals — improving developer experience

GoalWhat it meansHow to measure
Reduce cognitive loadDevelopers should not need to know K8s, Terraform, or CI/CD internalsTime to deploy a new service from scratch
Self-service provisioningAny resource available in minutes without ticketsTime from request to environment ready
Golden path adoptionPercentage of teams using standard templates% services using platform templates
Paved road reliabilityThe platform itself must be more reliable than what teams would build alonePlatform SLO — 99.9%+ availability

Backstage score — measure adoption

# Backstage plugin: tech-insights
# Define scorecards to measure platform adoption:
# - Does the service have a catalog-info.yaml? (catalog registered)
# - Does it have a runbook linked? (documentation)
# - Is it using the standard CI/CD template? (golden path)
# - Does it have SLOs defined? (reliability)
# - Is it owned by a team? (accountability)
# Score per service: 0-100. Platform team tracks org-wide average.

🎯 Interview Questions

PLATFORM ENG · ARCHITECT
What is Platform Engineering and how is it different from DevOps and SRE?
DevOps is a culture — break the wall between development and operations, share responsibility, automate. Everyone deploys their own code, everyone participates in on-call. SRE is a discipline — Google's approach to applying software engineering to operations problems. Focus on reliability, error budgets, eliminating toil. Platform Engineering is a product discipline — the platform team treats internal developers as customers and builds products (internal developer platforms, golden paths, self-service tools) that make those developers productive. The difference: DevOps says developers should do their own ops. Platform Engineering says if you want developers to do their own ops, you need to give them good tools so it is not painful. Without a platform team at scale, every team reinvents CI/CD, every team struggles with K8s complexity, consistency disappears. Platform Engineering centralises that expertise and exposes it as a product. The measure of success for Platform Engineering is developer satisfaction and DORA metrics, not uptime — that is the SRE team's job.
PLATFORM ENG · ENGINEER
What is Backstage and what problem does it solve?
Backstage is an open-source Internal Developer Portal originally built by Spotify and now a CNCF project. The problem it solves: at scale, developers do not know what services exist, who owns them, how to deploy them, or where to find their runbooks. An engineer wants to understand the payment-service — they have to know to ask the payments team, find their Confluence page, find their Grafana dashboard, find their PagerDuty policy — all in different places. Backstage centralises everything through a service catalog. Every service registers a catalog-info.yaml file in its Git repo. Backstage reads these files and builds a searchable directory of all services with links to their dashboards, owners, documentation, APIs, and dependencies. Software Templates allow developers to create new services from golden-path templates with all company standards pre-configured. Tech Docs renders markdown documentation alongside the service. Plugins integrate with GitHub, PagerDuty, Grafana, SonarQube, cost tools. At HPE: Backstage would show that telecom-sro is owned by the SRO team, has 3 active alerts, was last deployed 2 days ago, and has API documentation — all from one page.
PLATFORM ENG · ENGINEER
What is Platform Engineering and how does it differ from DevOps?
Platform Engineering is a specialisation within DevOps focused on building internal developer platforms (IDPs) — products that enable developers to self-serve infrastructure, deployments, and tooling without needing ops expertise. DevOps (the practice): developers and operations collaborate, break down silos, developers take ownership of their services end-to-end. The problem this solved: the old wall between dev and ops. But it created a new problem: each development team now must be expert in Kubernetes, Terraform, CI/CD, monitoring, security, compliance. This is cognitive overload. Platform Engineering: the platform team is a product team whose customers are internal developers. They build a paved road — standard, supported, secure ways to deploy services. Developers use the platform without needing to understand what is under it. A developer creates a new microservice: they fill a form in Backstage, click Create, and get a Git repo, Kubernetes manifests, CI/CD pipeline, monitoring dashboard, and runbook all pre-configured. They never write a Kubernetes YAML or Terraform config. DevOps is a culture and set of practices. Platform Engineering is a team and a product. They work together: the platform team practises DevOps while building the platform that makes DevOps easier for everyone else.
PLATFORM ENG · ENGINEER
What are DORA metrics and why do platform teams track them?
DORA (DevOps Research and Assessment) metrics are four measurements that predict software delivery performance and organisational success. Deployment Frequency: how often the team deploys to production. Elite: multiple times per day. Low: once per month or less. Higher frequency means smaller changes, less risk per deployment, faster feature delivery. Lead Time for Changes: time from code commit to running in production. Elite: under 1 hour. Includes: PR review time, CI build time, deployment time, environment provisioning time. A platform team reduces this by: optimising CI/CD pipelines, pre-provisioned environments, automated approvals for low-risk changes. Change Failure Rate: percentage of deployments that cause a production incident. Elite: 0-5%. If high: too many manual steps, insufficient testing, poor observability. Mean Time to Recovery (MTTR): how long to restore service after an incident. Elite: under 1 hour. Driven by: quality runbooks, automated rollback, good observability. Platform teams track these because: they show ROI of platform investments. Before Backstage: lead time 3 days. After: 2 hours. This justifies the platform team budget. They also identify bottlenecks. If lead time is high but deployment frequency is good: the problem is environment provisioning time, not CI/CD.
PLATFORM ENG · ARCHITECT
What is an Internal Developer Platform (IDP) and what should it include?
An IDP is a self-service layer that abstracts infrastructure complexity for developers. It is the product the platform team builds and operates. Core capabilities: Service catalogue: a searchable registry of all services, APIs, libraries, and documentation in the organisation. Developers find existing services before building new ones. Backstage is the most popular open-source option. Self-service provisioning: developers create new services, databases, message queues, and environments without raising tickets. Templates enforce standards automatically. Golden path CI/CD: standard pipeline templates for different service types (Java microservice, Node.js API, Python ML service). One click to get a fully working pipeline with build, test, scan, deploy, and rollback. Environment management: developers can create, clone, and delete environments on demand. Provision a staging environment in 5 minutes for testing a feature. Secrets management: self-service way to request and rotate secrets. Developer requests a database credential, it is automatically provisioned in HashiCorp Vault and injected into their service via CSI driver. Observability: per-service dashboards, pre-built for every service on the platform. Developer deploys a service, monitoring is automatic. The measure of a good IDP: time to deploy a net-new service from zero to production. Elite platform teams achieve under 1 hour including repository creation, CI/CD, Kubernetes deployment, monitoring, and documentation.
Continue Learning
📉 SLO🚨 Incident Management🏠 Home
🤖
AI Assistant
Ask anything about this topic
👋 Hi! I have read this page and can answer your questions.

Try asking: "Explain this topic in simple terms" or "Give me an example" or ask any specific question.