LearnwithVishnu
LearnwithVishnu
Basics → Production → Architect
← Home
📨Apache Kafka
BeginnerEngineerProductionArchitectEvent streaming — partitions, consumer groups, Strimzi on Kubernetes, KEDA
What is KafkaPartitionsCLIK8s🏢 Confluent TroubleshootInterview Q&A

📨 What is Kafka?

When to use Kafka vs traditional queues

KafkaRabbitMQ/SQS
Message persistenceRetained for days/weeksDeleted after consumption
Multiple consumersEach consumer group gets all messagesOne consumer per message
ReplayYes — rewind offset to re-processNo
ThroughputMillions/secondThousands/second
Best forEvent streaming, multiple services need same events, analyticsTask queues, simple pub/sub
Kafka concepts and comparison

🧩 Partitions & Consumer Groups

Partitions, consumer groups, ordering, replication

🖥️ CLI Commands

Topic management, consumer lag, offset reset

☸️ Kafka on Kubernetes

Strimzi Operator + KafkaTopic + KEDA auto-scaling

🔍 Troubleshooting

Consumer lag, under-replicated partitions, disk full

🏢 Confluent Kafka — Enterprise Edition

What is Confluent and how does it differ from Apache Kafka?

Apache Kafka is the open-source project — you install, manage, monitor, and operate it yourself. Confluent was founded by the original creators of Kafka (Jay Kreps, Neha Narkhede, Jun Rao) and builds a commercial platform on top of Apache Kafka with enterprise features, managed cloud offering, and support.

Apache KafkaConfluent PlatformConfluent Cloud
What it isOpen-source event streamingSelf-hosted Kafka + enterprise toolsFully managed Kafka as a Service
OperationsYou manage everythingYou manage, better toolingConfluent manages everything
Schema RegistryNot includedIncludedIncluded
KSQL / ksqlDBNot includedIncludedIncluded
Control CenterNot included (use Kafdrop/Grafana)Included — rich monitoring UICloud console
ConnectorsCommunity connectors100+ certified connectors100+ managed connectors
CostFree (infra cost only)Paid licencePay per usage (GB/CU)
Best forTeams with Kafka expertise, cost-sensitiveEnterprise, need Schema Registry + KSQLTeams wanting zero ops overhead

Confluent Key Components — what you get extra

1. Schema Registry

The most important Confluent feature. In Apache Kafka, producers and consumers agree on message format by convention — if a producer changes the schema, consumers break silently. Schema Registry enforces a contract: producers register schemas, consumers validate against them. Supports Avro, JSON Schema, Protobuf.

  • Schema evolution — backward compatible changes (add optional field) are allowed. Breaking changes (remove required field) are rejected by the registry.
  • Serialisation — instead of raw bytes, messages are serialised with schema ID embedded. Consumer knows exactly how to deserialise.
  • Real scenario: Telecom alarm events (TeMIP/SRO) have strict schemas. Producer sends alarm with 15 fields. Without Schema Registry: if a field is renamed, 10 downstream consumers silently fail. With Schema Registry: breaking change is rejected at produce time.
2. ksqlDB — SQL on Kafka Streams

Write SQL-like queries on live Kafka streams without writing Java or Python code. Create streaming aggregations, joins, and filters that run continuously.

-- Count errors per service in real-time (sliding 5-minute window)
CREATE TABLE error_counts AS
  SELECT service_name,
         COUNT(*) as error_count
  FROM application_logs
  WINDOW TUMBLING (SIZE 5 MINUTES)
  WHERE log_level = 'ERROR'
  GROUP BY service_name;

-- Join two streams: payment events + fraud signals
CREATE STREAM payment_risk AS
  SELECT p.payment_id, p.amount, f.risk_score
  FROM payments p
  LEFT JOIN fraud_signals f
  WITHIN 30 SECONDS
  ON p.user_id = f.user_id;
3. Confluent Control Center

Enterprise monitoring UI — topic browser, consumer group lag dashboard, schema management, connector management, alert configuration. Replaces the need for custom Grafana dashboards for Kafka monitoring.

4. Kafka Connect — Managed Connectors

Move data between Kafka and external systems without writing code. Confluent provides 100+ certified connectors:

  • Source connectors — pull data INTO Kafka: Debezium (database CDC), S3, Salesforce, PostgreSQL
  • Sink connectors — push data FROM Kafka: Elasticsearch, S3, Snowflake, MongoDB, Azure Blob
# Debezium PostgreSQL CDC connector — capture every DB change as Kafka event
{
  "name": "postgres-source",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "postgres",
    "database.port": "5432",
    "database.user": "debezium",
    "database.password": "${file:/kafka/connect/secrets.properties:db.password}",
    "database.dbname": "telecom_sro",
    "table.include.list": "public.alarms,public.service_events",
    "topic.prefix": "dbserver1"
  }
}
# Every INSERT/UPDATE/DELETE on alarms table → Kafka topic event
# Downstream services react in real-time without polling the database

Confluent Cloud — when to use it

Confluent Cloud is Kafka as a fully managed service on AWS, Azure, or GCP. You create topics, set retention, configure producers/consumers — Confluent handles brokers, ZooKeeper/KRaft, replication, upgrades, disk management, and scaling. Pricing is based on Confluent Units (CUs) for compute and GB for storage and networking.

Use Confluent Cloud whenYou want Kafka's power without the operational burden. No broker sizing decisions, no ZooKeeper management, no disk planning, no manual scaling. Your team focuses on building applications, not running Kafka infrastructure. Particularly valuable for smaller teams and startups who cannot justify a full-time Kafka platform engineer.
Use Self-hosted Apache Kafka (Strimzi on K8s) whenCost at scale matters — Confluent Cloud becomes expensive at high throughput (100MB/sec+). Data sovereignty requirements mean you cannot send data to a third-party SaaS. You have the team expertise to operate it. You want complete control over configuration.

Confluent in Interviews — what to say

When asked about Kafka in senior interviews, distinguishing Apache Kafka from Confluent shows depth:

"I have used Apache Kafka on Kubernetes with Strimzi Operator for the TeMIP/SRO telecom platform at HPE. I understand Confluent's additional value — particularly Schema Registry for enforcing message contracts across microservices, and ksqlDB for streaming analytics without writing consumer code. For teams wanting managed Kafka, Confluent Cloud removes the operational complexity. For our platform, we chose Strimzi on OpenShift to keep data within the customer's infrastructure due to telco compliance requirements."

🎯 Interview Questions

KAFKA · ENGINEER
Explain Kafka partitions and consumer groups. Why do they matter?
Partitions are the unit of parallelism in Kafka. A topic split into 6 partitions can be processed by up to 6 consumers simultaneously. If your topic has only 1 partition, no matter how many consumer instances you start, only 1 can actively consume at a time. Consumer groups are a way for multiple consumers to share the work of reading a topic. Each partition is assigned to exactly one consumer within a group. If you have 6 partitions and 3 consumers in a group, each consumer reads 2 partitions. If one consumer fails, Kafka automatically reassigns its partitions to surviving consumers. This is the key to Kafka scaling: more messages coming in, add partitions and scale up consumer deployment. KEDA can do this automatically by watching consumer group lag. Important: different consumer groups are completely independent. The analytics service and the payment service can both read from the same user-events topic, each maintaining their own offset and processing at their own pace. Neither affects the other.
KAFKA · ARCHITECT
How do you ensure exactly-once message processing in Kafka?
Three delivery semantics. At-most-once: producer sends message, does not retry on failure. Consumer processes message before committing offset. If consumer crashes after processing but before committing: message is not reprocessed (lost). At-least-once: producer retries until acknowledged. Consumer commits offset only after processing. If consumer crashes after processing but before committing: message is reprocessed (duplicate). Exactly-once: achieved with Kafka transactions (Kafka 0.11+). Producer uses transactional.id and sends messages and offset commits atomically. Consumer reads only committed messages (isolation.level=read_committed). Exactly-once is the hardest and most expensive. In practice: at-least-once with idempotent consumers is more common and practical. Make your consumer idempotent — if it processes a message twice, the outcome is the same as processing once. For payment processing: use the payment event ID as an idempotency key. If you see the same payment ID twice, the second one is a no-op.
KAFKA · PRODUCTION
Consumer lag is growing and Kafka messages are backing up. How do you respond?
Growing lag means consumers cannot keep up with producer throughput. Investigation first: kafka-consumer-groups.sh describe shows lag per partition. Is lag growing uniformly across all partitions or concentrated on specific partitions? Growing uniformly: consumer is too slow overall. Scale up consumer replicas. But can you? Check partition count — you cannot have more active consumers than partitions. If you have 3 partitions and already 3 consumers, adding a 4th helps nothing. Increase partitions first (can only increase, not decrease in Kafka). Growing on specific partitions: those specific partitions have slow messages. Check: is one partition producing 10x the messages of others? That means your partitioning key is uneven (hot partition). Review the partition key strategy. Short-term: increase consumer replicas to max useful (equal to partition count). Enable KEDA to auto-scale based on lag. Long-term: review consumer processing time — are there slow database calls inside the consumer that can be optimised or made async? Consider consumer batch processing instead of one-at-a-time for throughput improvement.
Continue Learning
☸️ Kubernetes🔥 Prometheus🔧 Jenkins🏠 All Topics
🤖
AI Assistant
Ask anything about this topic
👋 Hi! I have read this page and can answer your questions.

Try asking: "Explain this topic in simple terms" or "Give me an example" or ask any specific question.