Kafka Fundamentals for System Design Interview

By Pradyumna Chippigiri

February 6, 2026


This article is my attempt to make Kafka feel intuitive. We’ll walk through the core building blocks and the reasoning behind them, with practical notes along the way.


Let’s dive in.

What is Kafka?

Kafka documentation states that Apache Kafka is an open-source, highly scalable, durable, fault-tolerant distributed event streaming platform. A variety of companies use Kafka in production systems, you can access the list of companies that use Kafka here.


A stream is a continuous flow of data (records/messages/events). These messages are published by producers and consumers read and process them. Kafka, or any streaming platform, handles this workflow efficiently by possessing the following capabilities:

Kafka as a distributed event streaming platform


In a nutshell, Kafka works on the traditional client-server architecture. The server is Kafka, the producer and the consumer are the clients. The communication between client and server happens via customized high-performance TCP protocol implemented by the Kafka team. There are client libraries available for almost all programming languages.

Kafka Architecture

On a higher level, this is how the Kafka architecture looks. In the sections below we will go over all the key components.


Kafka architecture overview

Message Structure

A message (also called data, record, or event) is the basic unit of data stored inside the Kafka log. We can think of a message like a row in a database.


This message or event represents something that happened (like a trigger for async processing).


Kafka message structure

An event describes something that happened. For example, in an e-commerce system, examples of events can be an order is created, an item is added to cart, an order is delivered. In Kafka, we read or write data in the form of events. We can send this data in XML, JSON, or Avro.


Each message is a key-value pair, which looks like this:

{
  "order_id": "order-1234",
  "order_amount": "50.00",
  "created_by": "praddy",
  "created_on": "2026-02-09T22:02:00Z"
}

Topics

Kafka topics overview


In a real system, there are multiple kinds of events flowing continuously like orders, payments, and application logs. It makes sense to group these related events together, and in Kafka this grouping is called a topic.


A topic is a logical abstraction: it is a named stream that groups related records together.


We can think of Kafka topics like a table in a database (collection of related records), or a folder in a file system (collection of related files).


Every Kafka message is published to exactly one topic. We can have multiple topics in our Kafka cluster each storing a specific category of events.


Topics are the way you publish and subscribe to data in Kafka. When you publish a message, you publish it to a topic, and when you consume a message, you consume it from a topic. Topics are always multi-producer; that is, a topic can have zero, one, or many producers that write data to it.


Internally, each topic is a combination of multiple partitions.

Partitions

Hussein Nasser always quotes that the best way to work with billions of rows in a database is to avoid working with billions of rows. That’s exactly how Kafka works.


Just like how a database shards data when a single node becomes a write bottleneck, Kafka scales write throughput by increasing the number of partitions in a topic. This allows writes to happen in parallel.


A partition is an ordered, immutable, append-only log of messages (just like write-ahead logs in databases). Since messages are always appended, they can be read in the same order from beginning to end.


Partitioning gives Kafka the ability to scale out, meaning partitions can spread across a cluster of machines and hold more data than a single machine could handle. Since it is partitioned, everything like storage, reads, and writes gets distributed across the cluster: producers can write to multiple partitions in parallel, and consumers can read from multiple partitions in parallel.


Each partition contains a different subset of the topic’s data. If we take the union of records across all partitions, we get all the records that belong to the topic.


Offset: Offset is a unique identifier assigned to each message within a partition. It is like the position number of that message in the log (0, 1, 2, and so on).


We will talk about why offsets are useful in the Consumers section below.


Important note: these logs are persistent meaning every message received is immediately written to disk.


Below is a simple example of a single producer and a single consumer writing and reading from a two-partition topic. The image is from LinkedIn’s Engineering Blog.

Kafka partitions with producer and consumer

Kafka Broker

Kafka brokers are the servers that run Kafka. We talked about how producers publish data to a topic and consumers consume from the topic. We also talked about how the data is appended into an append-only log inside partitions. But where does all of this actually live and run?


That’s where brokers come in.


When a producer publishes an event to a topic, Kafka automatically routes that event to the correct partition of that topic. That partition is stored on some broker, and the broker is responsible for receiving the write request and persisting the event to disk. Similarly, when a consumer subscribes to a topic, it reads the data by fetching messages from the brokers that host the partitions for that topic.


A Kafka cluster is simply a group of multiple brokers working together. Having multiple brokers allows Kafka to distribute partitions across machines, handle high traffic, and stay available even if one broker goes down. By adding more brokers, Kafka can scale horizontally without changing producer or consumer code.

Producers

A producer is a client application that publishes events to Kafka topics. Producers send records to a topic, and Kafka routes those records to the appropriate partitions, which are stored on brokers.


Question: We said producers send data to a topic and data ends up in a partition on a broker. How does Kafka decide which partition a record should go to? And can producers control that?


Answer:

Kafka follows a few simple rules:

Under the hood, this is usually:

If needed, producers can also:

I will dedicate a separate article to discuss the internals of the Kafka producer in the upcoming weeks.

Consumers

A consumer is a client application that subscribes to one or more Kafka topics and reads records from them. Internally, consumers always read from partitions, not from a topic as a whole. They work on a pull-based model.


Here’s a wonderful write up from this page of Kafka documentation, as to why they chose pull over push for consumers.

Kafka consumers pull model



Question: If a topic has multiple partitions and you spin up multiple consumers, how does Kafka decide who reads what, and how does it avoid two consumers accidentally processing the same message while preserving ordering?


Answer:

If consumers were allowed to read partitions independently without any coordination, then every consumer could end up reading the same partitions and hence the same messages, leading to duplicate processing.


Kafka solves this by introducing consumer groups. A consumer group is just a logical grouping of consumers identified by the same group.id. When multiple consumers join the same group, Kafka treats them like a team that wants to share the work, and it enforces a very important rule:


A consumer can read from one or more partitions, but no two consumers in the same group will read from the same partition at the same time. This keeps ordering intact (since each partition is an ordered log) and ensures that each message is processed only once per consumer group.



Now remember the offset we briefly introduced earlier as the position number (0, 1, 2, and so on) of a message inside a partition. This is exactly why offsets are useful. As consumers read from a partition, Kafka tracks how far they have read using offsets, and these offsets are tracked per consumer group, per partition.


Kafka stores these committed offsets inside an internal Kafka topic called __consumer_offsets, which is how Kafka can remember progress even if a consumer crashes or restarts. Because of this, consumers can resume from where they left off after failures, and different consumer groups can consume the same topic independently with their own progress.


I will dedicate a separate article to discuss the internals of Kafka consumers.

Replication

Till now, we have talked about how partitions store data on brokers. But if a partition existed on only one broker, Kafka would not be fault tolerant. If that broker went down, data would be unavailable or even lost. Kafka solves this using replication.



At this point, we understand what replication is and why Kafka uses leaders, followers, and ISR. But this naturally raises the next question: who tracks broker liveness, who decides leaders, and who coordinates failover and metadata updates? That responsibility belongs to Kafka’s control plane: the KRaft Controller.

KRaft Controller

Kafka replication and failover only work if the cluster has a single, consistent authority that coordinates metadata changes. That role is handled by the Kafka Controller (Kafka’s control plane).


At a high level, the controller’s responsibilities include:


Earlier versions of Kafka relied on ZooKeeper for this coordination. Modern Kafka replaces ZooKeeper with KRaft (Kafka Raft), where Kafka manages this metadata internally with no external dependency.


One important note: the controller does not handle data traffic. Producers and consumers do not talk to it directly. It coordinates the cluster in the background so Kafka keeps working correctly even as brokers come and go.


Well, this article covered the fundamentals of Kafka. In the coming months, we’ll go deeper into the parts that matter when you actually run Kafka in production:



Hope you liked this article.

If this was useful, subscribe to my weekly newsletter: Evolving Engineer


Support my writing here: buymeacoffee.com/cpradyumnao