Apache Kafka — LearnwithVishnu

📨Apache Kafka

BeginnerEngineerProductionArchitectEvent streaming — partitions, consumer groups, Strimzi on Kubernetes, KEDA

What is Kafka Partitions CLI K8s 🏢 Confluent Troubleshoot Interview Q&A

📨 What is Kafka?

›

When to use Kafka vs traditional queues

	Kafka	RabbitMQ/SQS
Message persistence	Retained for days/weeks	Deleted after consumption
Multiple consumers	Each consumer group gets all messages	One consumer per message
Replay	Yes — rewind offset to re-process	No
Throughput	Millions/second	Thousands/second
Best for	Event streaming, multiple services need same events, analytics	Task queues, simple pub/sub

Kafka concepts and comparison

🧩 Partitions & Consumer Groups

›

Partitions, consumer groups, ordering, replication

🖥️ CLI Commands

›

Topic management, consumer lag, offset reset

☸️ Kafka on Kubernetes

›

Strimzi Operator + KafkaTopic + KEDA auto-scaling

🔍 Troubleshooting

›

Consumer lag, under-replicated partitions, disk full

🏢 Confluent Kafka — Enterprise Edition

›

What is Confluent and how does it differ from Apache Kafka?

Apache Kafka is the open-source project — you install, manage, monitor, and operate it yourself. Confluent was founded by the original creators of Kafka (Jay Kreps, Neha Narkhede, Jun Rao) and builds a commercial platform on top of Apache Kafka with enterprise features, managed cloud offering, and support.

	Apache Kafka	Confluent Platform	Confluent Cloud
What it is	Open-source event streaming	Self-hosted Kafka + enterprise tools	Fully managed Kafka as a Service
Operations	You manage everything	You manage, better tooling	Confluent manages everything
Schema Registry	Not included	Included	Included
KSQL / ksqlDB	Not included	Included	Included
Control Center	Not included (use Kafdrop/Grafana)	Included — rich monitoring UI	Cloud console
Connectors	Community connectors	100+ certified connectors	100+ managed connectors
Cost	Free (infra cost only)	Paid licence	Pay per usage (GB/CU)
Best for	Teams with Kafka expertise, cost-sensitive	Enterprise, need Schema Registry + KSQL	Teams wanting zero ops overhead

Confluent Key Components — what you get extra

1. Schema Registry

The most important Confluent feature. In Apache Kafka, producers and consumers agree on message format by convention — if a producer changes the schema, consumers break silently. Schema Registry enforces a contract: producers register schemas, consumers validate against them. Supports Avro, JSON Schema, Protobuf.

Schema evolution — backward compatible changes (add optional field) are allowed. Breaking changes (remove required field) are rejected by the registry.
Serialisation — instead of raw bytes, messages are serialised with schema ID embedded. Consumer knows exactly how to deserialise.
Real scenario: Telecom alarm events (TeMIP/SRO) have strict schemas. Producer sends alarm with 15 fields. Without Schema Registry: if a field is renamed, 10 downstream consumers silently fail. With Schema Registry: breaking change is rejected at produce time.

2. ksqlDB — SQL on Kafka Streams

Write SQL-like queries on live Kafka streams without writing Java or Python code. Create streaming aggregations, joins, and filters that run continuously.

-- Count errors per service in real-time (sliding 5-minute window)
CREATE TABLE error_counts AS
  SELECT service_name,
         COUNT(*) as error_count
  FROM application_logs
  WINDOW TUMBLING (SIZE 5 MINUTES)
  WHERE log_level = 'ERROR'
  GROUP BY service_name;

-- Join two streams: payment events + fraud signals
CREATE STREAM payment_risk AS
  SELECT p.payment_id, p.amount, f.risk_score
  FROM payments p
  LEFT JOIN fraud_signals f
  WITHIN 30 SECONDS
  ON p.user_id = f.user_id;

3. Confluent Control Center

Enterprise monitoring UI — topic browser, consumer group lag dashboard, schema management, connector management, alert configuration. Replaces the need for custom Grafana dashboards for Kafka monitoring.

4. Kafka Connect — Managed Connectors

Move data between Kafka and external systems without writing code. Confluent provides 100+ certified connectors:

Source connectors — pull data INTO Kafka: Debezium (database CDC), S3, Salesforce, PostgreSQL
Sink connectors — push data FROM Kafka: Elasticsearch, S3, Snowflake, MongoDB, Azure Blob

# Debezium PostgreSQL CDC connector — capture every DB change as Kafka event
{
  "name": "postgres-source",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "postgres",
    "database.port": "5432",
    "database.user": "debezium",
    "database.password": "${file:/kafka/connect/secrets.properties:db.password}",
    "database.dbname": "telecom_sro",
    "table.include.list": "public.alarms,public.service_events",
    "topic.prefix": "dbserver1"
  }
}
# Every INSERT/UPDATE/DELETE on alarms table → Kafka topic event
# Downstream services react in real-time without polling the database

Confluent Cloud — when to use it

Confluent Cloud is Kafka as a fully managed service on AWS, Azure, or GCP. You create topics, set retention, configure producers/consumers — Confluent handles brokers, ZooKeeper/KRaft, replication, upgrades, disk management, and scaling. Pricing is based on Confluent Units (CUs) for compute and GB for storage and networking.

Use Confluent Cloud whenYou want Kafka's power without the operational burden. No broker sizing decisions, no ZooKeeper management, no disk planning, no manual scaling. Your team focuses on building applications, not running Kafka infrastructure. Particularly valuable for smaller teams and startups who cannot justify a full-time Kafka platform engineer.

Use Self-hosted Apache Kafka (Strimzi on K8s) whenCost at scale matters — Confluent Cloud becomes expensive at high throughput (100MB/sec+). Data sovereignty requirements mean you cannot send data to a third-party SaaS. You have the team expertise to operate it. You want complete control over configuration.

Confluent in Interviews — what to say

When asked about Kafka in senior interviews, distinguishing Apache Kafka from Confluent shows depth:

"I have used Apache Kafka on Kubernetes with Strimzi Operator for the TeMIP/SRO telecom platform at HPE. I understand Confluent's additional value — particularly Schema Registry for enforcing message contracts across microservices, and ksqlDB for streaming analytics without writing consumer code. For teams wanting managed Kafka, Confluent Cloud removes the operational complexity. For our platform, we chose Strimzi on OpenShift to keep data within the customer's infrastructure due to telco compliance requirements."

🎯 Interview Questions

›

KAFKA · ENGINEER

Explain Kafka partitions and consumer groups. Why do they matter?

Partitions are the unit of parallelism in Kafka. A topic split into 6 partitions can be processed by up to 6 consumers simultaneously. If your topic has only 1 partition, no matter how many consumer instances you start, only 1 can actively consume at a time. Consumer groups are a way for multiple consumers to share the work of reading a topic. Each partition is assigned to exactly one consumer within a group. If you have 6 partitions and 3 consumers in a group, each consumer reads 2 partitions. If one consumer fails, Kafka automatically reassigns its partitions to surviving consumers. This is the key to Kafka scaling: more messages coming in, add partitions and scale up consumer deployment. KEDA can do this automatically by watching consumer group lag. Important: different consumer groups are completely independent. The analytics service and the payment service can both read from the same user-events topic, each maintaining their own offset and processing at their own pace. Neither affects the other.

KAFKA · ARCHITECT

How do you ensure exactly-once message processing in Kafka?

Three delivery semantics. At-most-once: producer sends message, does not retry on failure. Consumer processes message before committing offset. If consumer crashes after processing but before committing: message is not reprocessed (lost). At-least-once: producer retries until acknowledged. Consumer commits offset only after processing. If consumer crashes after processing but before committing: message is reprocessed (duplicate). Exactly-once: achieved with Kafka transactions (Kafka 0.11+). Producer uses transactional.id and sends messages and offset commits atomically. Consumer reads only committed messages (isolation.level=read_committed). Exactly-once is the hardest and most expensive. In practice: at-least-once with idempotent consumers is more common and practical. Make your consumer idempotent — if it processes a message twice, the outcome is the same as processing once. For payment processing: use the payment event ID as an idempotency key. If you see the same payment ID twice, the second one is a no-op.

KAFKA · PRODUCTION

Consumer lag is growing and Kafka messages are backing up. How do you respond?

Growing lag means consumers cannot keep up with producer throughput. Investigation first: kafka-consumer-groups.sh describe shows lag per partition. Is lag growing uniformly across all partitions or concentrated on specific partitions? Growing uniformly: consumer is too slow overall. Scale up consumer replicas. But can you? Check partition count — you cannot have more active consumers than partitions. If you have 3 partitions and already 3 consumers, adding a 4th helps nothing. Increase partitions first (can only increase, not decrease in Kafka). Growing on specific partitions: those specific partitions have slow messages. Check: is one partition producing 10x the messages of others? That means your partitioning key is uneven (hot partition). Review the partition key strategy. Short-term: increase consumer replicas to max useful (equal to partition count). Enable KEDA to auto-scale based on lag. Long-term: review consumer processing time — are there slow database calls inside the consumer that can be optimised or made async? Consider consumer batch processing instead of one-at-a-time for throughput improvement.

Continue Learning

☸️ Kubernetes 🔥 Prometheus 🔧 Jenkins 🏠 All Topics