LearnwithVishnu
LearnwithVishnu
Basics → Production → Architect
← Home
📦Harbor
BeginnerEngineerProductionArchitectEnterprise container registry — RBAC, vulnerability scanning, replication, image signing
What is HarborKey FeaturesImage SigningInterview Q&A

📦 What is Harbor?

Why self-hosted registry?

Harbor (self-hosted)ECR/ACR/GCRDocker Hub
CostStorage cost onlyStorage + transfer feesFree/paid tier
Air-gappedYesNoNo
RBACProject-based, LDAP/ADIAM policiesOrganisation teams
Vulnerability scanTrivy built-inECR Inspector, ACR TasksDocker Scout (paid)
Multi-cloudYes — one registry for allCloud-specificUniversal but rate-limited
Real Scenario — Air-gapped TelcoTeMIP network management platform at a major telco runs in a data centre with no internet access. Development pipeline pushes images to internet-facing Harbor. A replication rule syncs production-tagged images to the air-gapped Harbor inside the customer network every 4 hours. Production pods pull from local registry — zero internet dependency, full vulnerability scanning in both environments.
Install Harbor + core features

⚙️ Key Features

Projects, scanning, replication, proxy cache, retention

🔏 Image Signing

Supply chain security — prove your image is authentic

Image signing with Cosign answers: was this image actually built by our CI/CD? An attacker could push a malicious image with the same tag. Without signing, Kubernetes cannot tell the difference. With Cosign + Kyverno policy, any unsigned image is rejected at the cluster level.

Cosign signing + Kyverno enforcement

🏗️ Harbor Architecture and Components

Harbor is an enterprise container registry with security built in

ComponentWhat it does
RegistryCore Docker registry — stores image layers and manifests
CoreAPI server — handles all Harbor API calls, authentication, RBAC
PortalWeb UI — project management, vulnerability reports, replication
Database (PostgreSQL)Stores metadata, users, policies, scan results
RedisJob queue for async operations (replication, scanning)
Trivy / ClairVulnerability scanner — scans images on push or schedule
NotaryContent trust — signs images so only signed images can be deployed

Install Harbor on Kubernetes with Helm

helm repo add harbor https://helm.goharbor.io
helm repo update

helm install harbor harbor/harbor   --namespace harbor --create-namespace   --set expose.type=ingress   --set expose.ingress.hosts.core=registry.company.com   --set externalURL=https://registry.company.com   --set harborAdminPassword=AdminSecurePass123   --set persistence.enabled=true   --set persistence.persistentVolumeClaim.registry.size=50Gi

🔒 Projects, RBAC, Vulnerability Scanning

Projects — namespace isolation for images

Every image in Harbor lives in a Project. Projects can be Public (anyone can pull) or Private (requires authentication). RBAC is per-project: a developer can push to their team's project but not production. Projects also contain policies for vulnerability scanning, content trust, and tag retention.

# Push image to Harbor project
docker login registry.company.com
docker tag myapp:v1 registry.company.com/production/myapp:v1
docker push registry.company.com/production/myapp:v1

# AKS pull from Harbor — create imagePullSecret
kubectl create secret docker-registry harbor-creds   --docker-server=registry.company.com   --docker-username=robot-account   --docker-password=robottoken123   --namespace production

Vulnerability scanning — scan on push

Configure Harbor to automatically scan every image pushed to a project. Set a policy: Prevent deployment of vulnerable images with severity HIGH or CRITICAL. When AKS tries to pull a flagged image, Harbor's admission webhook rejects it before the pod starts.

Scan triggerWhen it runs
Scan on pushAutomatic scan when image is pushed — catches new images immediately
Scheduled scanRe-scan all images on a schedule — catches newly-discovered CVEs in old images
Manual scanTriggered from UI or API for specific images

Replication — sync images across registries

Harbor can replicate images between registries: push-based (Harbor pushes to target when image is pushed) or pull-based (Harbor pulls from source on schedule). Use cases: replicate from dev registry to production registry, replicate from ACR to on-premise Harbor, disaster recovery by keeping a copy in a second region.

# Harbor replication policy via CLI (Harbor API)
curl -X POST https://registry.company.com/api/v2.0/replication/policies   -H "Authorization: Basic $(echo -n admin:password | base64)"   -H "Content-Type: application/json"   -d '{
    "name": "sync-to-production",
    "src_registry": {"id": 1},
    "dest_registry": {"id": 2},
    "filters": [{"type": "name", "value": "production/**"}],
    "trigger": {"type": "event_based"},
    "deletion": false,
    "enabled": true
  }'

🎯 Interview Questions

HARBOR · ARCHITECT
Why would you run your own Harbor registry instead of using ACR or ECR?
There are four compelling reasons. First: air-gapped environments. Banks, defence, and telcos often run in networks with no internet access. ECR and ACR require internet connectivity; Harbor runs entirely on-prem with no external dependencies. Second: multi-cloud and cloud-agnostic strategy. If you run workloads on AWS EKS and Azure AKS simultaneously, a single Harbor registry serves both without cloud-specific authentication complexity. Third: compliance and data sovereignty. Some regulations require all artifacts to remain within specific geographic boundaries or infrastructure. Self-hosted Harbor gives absolute control. Fourth: cost at scale. ECR charges per GB stored and per GB transferred. At large scale (hundreds of teams, thousands of images), Harbor on your own storage can be significantly cheaper. Real scenario: at HPE Telecom, the TeMIP/SRO platform runs in a customer data centre with restricted internet. The development team pushes to an internet-facing Harbor, which replicates to the air-gapped Harbor in the customer environment. Application pods pull from the local registry — no internet dependency in production. Vulnerability scanning runs in both registries so security teams see results in both environments.
HARBOR · ENGINEER
What is Harbor and how does it compare to Azure Container Registry (ACR)?
Harbor is an open-source enterprise container registry. ACR is Azure's managed container registry service. Harbor: self-hosted (runs on your Kubernetes cluster or VMs), supports Docker images, Helm charts, OCI artifacts. Built-in vulnerability scanning (Trivy/Clair), content trust (Notary image signing), project-based RBAC, image replication across registries, and a web UI for all operations. Works in any cloud or on-premise. ACR: fully managed by Azure, zero infrastructure to maintain, integrates natively with AKS (AcrPull role, no credentials needed), Defender for Containers provides scanning, geo-replication available in Premium tier, connected to Azure AD for RBAC. When to choose Harbor: multi-cloud or hybrid environments where you need one registry across AWS, Azure, and on-premise. Compliance requiring on-premise data storage. When to choose ACR: pure Azure environment, want zero registry infrastructure to manage. Both support: image scanning, RBAC, replication, OCI artifacts. Harbor requires more ops effort but gives full control. ACR is simpler for Azure-first teams. At HPE: Harbor was used because the platform spanned multiple environments and customers, and a single self-hosted registry worked across all of them.
HARBOR · ENGINEER
How does Harbor vulnerability scanning work and how do you enforce it in AKS?
Harbor uses Trivy (default) or Clair as the scanning engine. When an image is pushed to Harbor with scan-on-push enabled: Harbor triggers a scan job, Trivy analyses the image layers checking OS packages and language dependencies against the CVE database, results are stored in Harbor database with severity levels (Critical, High, Medium, Low). You can view the full vulnerability report in the Harbor UI per image tag. Enforcement in AKS: Harbor has a built-in admission webhook. Configure a vulnerability policy on the project: prevent deployment of images with vulnerabilities above HIGH severity. When a pod spec references an image with HIGH/CRITICAL CVEs, the Harbor admission webhook rejects the pod creation before it even starts. For AKS specifically: the workflow is: CI builds image → pushes to Harbor → Harbor scans automatically → if scan passes, image is accessible → AKS can pull it. If CI tries to deploy a vulnerable image, the admission webhook blocks it. Alternative without webhook: in the CI pipeline, call the Harbor API after push to get the scan result, fail the pipeline if vulnerabilities are found. This is the GitOps-friendly approach — the pipeline rejects vulnerable images before they ever reach Git or the cluster.
HARBOR · PRODUCTION
Harbor disk is filling up. How do you manage image storage?
Tag retention policies: configure in each project Settings → Tag Retention. Define rules: keep only the last 10 tags, keep tags matching v*.*.* (semver), keep tags pushed in the last 30 days. Tags not matching are deleted automatically on the configured schedule. Garbage collection: deleting tags in Harbor removes the metadata but not the underlying image layers from storage. Run garbage collection to reclaim the space: Administration → Garbage Collection → Run GC Now. Important: run GC during low-traffic periods — it temporarily makes the registry read-only. Schedule regular GC (weekly or daily). Monitor storage: Harbor dashboard shows total storage used. Set up an alert when usage exceeds 80%. Identify large images: in the project, sort by size. Images over 2GB are usually candidates for optimisation — check if multi-stage builds are being used. Image cleaning strategy: development projects: aggressive retention (keep 5 latest). Production projects: conservative (keep all semver releases for 1 year, delete latest/dev tags after 7 days). Never auto-delete from production without human review.
Continue Learning
🐳 Docker🛡️ DevSecOps☸️ Kubernetes🏠 All Topics
🤖
AI Assistant
Ask anything about this topic
👋 Hi! I have read this page and can answer your questions.

Try asking: "Explain this topic in simple terms" or "Give me an example" or ask any specific question.