Data Engineering & Databases · Streaming

Kafka Basics: Events, Partitions, and Consumer Groups

Build streaming pipelines without confusion.

Reading time: ~8–12 min
Level: All levels
Updated:

Kafka looks intimidating until you get the mental model: it’s a distributed log. Producers append events to topics, topics are split into partitions (for scale), and consumer groups let you process those partitions in parallel without double-reading. This guide is “Kafka Basics: Events, Partitions, and Consumer Groups” in practical terms—what each concept means, why it exists, and how to run a tiny local setup to see the behavior with your own eyes.


Quickstart: the fastest way to understand Kafka

If you only do one thing, make Kafka feel tangible: run it locally, create a multi-partition topic, then start two consumers in the same group and watch Kafka split the work. The goal isn’t a perfect setup—it’s a working mental model you can reuse.

Do this in 15–20 minutes

  • Start Kafka (single broker) locally
  • Create a topic with 3 partitions
  • Produce a few keyed events
  • Run two consumers with the same group.id
  • Inspect offsets + lag for the group

What you’ll learn (without theory overload)

  • Why Kafka ordering is per partition
  • Why partitions are the unit of parallelism
  • What a consumer group really coordinates
  • Why offsets are a cursor (not “deleting messages”)
  • How to spot “stuck” consumers quickly
Make it click

Keep a simple rule in your head: partitions are lanes and a consumer group is a team of workers. Kafka assigns lanes to workers. If you add workers, Kafka reassigns lanes (a rebalance).

Overview: what this post covers (and why it matters)

Kafka is often introduced as “a messaging system,” but that framing hides the two ideas that make it powerful: durability (events are persisted and replayable) and scalable consumption (consumer groups coordinate parallel processing).

The practical questions this post answers

  • What is an event in Kafka, and what should go into it?
  • How do partitions affect ordering, scaling, and performance?
  • What does a consumer group coordinate, exactly?
  • Why do rebalances happen, and how do you avoid “thrash”?
  • How do offsets work (and why they’re the key to reliability)?

You’ll also get a minimal local setup and a few operational habits (like checking consumer lag) that save hours when you move from “toy pipeline” to “production job.”

Kafka is not “just queues”

Queues are often “read once and disappear.” Kafka is different: it keeps events for a retention period, and consumers track their position via offsets. That makes replay and backfills a normal workflow, not an emergency.

Core concepts: the mental model that prevents confusion

Here are the terms you’ll see everywhere, explained in the “what it is / what it does / why you should care” style. If you internalize these, Kafka stops feeling magical.

Concept Mental model Why it matters in practice
Event A fact that happened (immutable record) Design it so downstream systems can process it reliably and replay it safely.
Topic A named stream (like a table you append to) It’s your “channel” of events; producers write to it, consumers read from it.
Partition A shard / lane inside a topic The unit of ordering and parallelism: ordering is guaranteed within one partition.
Offset A cursor position in a partition Offsets are how consumers remember “where I’m at” and how you do replays.
Producer Appends events to a topic Producer settings (acks, retries, batching) control durability and throughput.
Consumer Reads events from a topic Consumers must keep up, handle failures, and commit offsets correctly.
Consumer group A team that shares work Each partition is assigned to at most one consumer in a group at a time.
Broker A Kafka server node Brokers store partitions and serve reads/writes; clusters scale by adding brokers.

Events: what to include (and what to avoid)

A Kafka event is typically a key/value pair with metadata. Your “value” is often JSON/Avro/Protobuf, but the bigger question is: can someone process this later without guessing?

Good event design habits

  • Include an event type and schema version
  • Use stable IDs (user_id, order_id, device_id)
  • Include timestamps (event time and ingestion time if needed)
  • Keep it additive (new fields > breaking changes)
  • Prefer facts over derived conclusions (“paid” event > “good customer”)

Common foot-guns

  • Using the key randomly (destroys ordering guarantees)
  • Embedding huge blobs (hurts throughput and retention costs)
  • Changing meaning without versioning (“status” changes semantics)
  • Relying on consumer-side time as “truth”
  • Assuming Kafka guarantees “exactly once” by default

Partitions: ordering, scaling, and why keys matter

Partitions exist so Kafka can scale. But they also define what “in order” means. Kafka can preserve order for events that land in the same partition—so you typically use a key that groups related events (like order_id) to keep their sequence consistent.

Ordering isn’t global

Kafka does not guarantee a single total order across all partitions of a topic. If you need strict ordering for a stream of related events, ensure they share a key that maps them to the same partition.

Consumer groups: “one partition, one consumer” (within a group)

A consumer group is Kafka’s scaling primitive for reading: Kafka assigns partitions to consumers so that a partition is processed by at most one consumer in the group at a time. Add consumers and Kafka may rebalance assignments. Add partitions and you increase maximum parallelism.

Step-by-step: run Kafka locally and see the behavior

This is a practical walk-through you can copy/paste. It uses a single-node setup (good for learning) and focuses on the three things that trip people up: partitions, keys, and consumer groups.

Step 1 — Start Kafka locally (single broker, KRaft)

For a local demo, we keep it simple: one broker, plaintext listener, and a small amount of storage. This is not “production secure,” and that’s okay—the goal is to observe behavior.

version: "3.8"

services:
  kafka:
    image: bitnami/kafka:latest
    container_name: kafka
    ports:
      - "9092:9092"
      - "29092:29092"
    environment:
      # KRaft (no ZooKeeper) single-node demo
      - KAFKA_ENABLE_KRAFT=yes
      - KAFKA_CFG_PROCESS_ROLES=broker,controller
      - KAFKA_CFG_NODE_ID=1
      - KAFKA_CFG_CONTROLLER_QUORUM_VOTERS=1@kafka:9093
      - KAFKA_CFG_CONTROLLER_LISTENER_NAMES=CONTROLLER

      # Listeners: one for inside Docker, one for your host machine
      - KAFKA_CFG_LISTENERS=PLAINTEXT://:9092,PLAINTEXT_HOST://:29092,CONTROLLER://:9093
      - KAFKA_CFG_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092,PLAINTEXT_HOST://localhost:29092
      - KAFKA_CFG_LISTENER_SECURITY_PROTOCOL_MAP=CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      - ALLOW_PLAINTEXT_LISTENER=yes

      # Demo-friendly defaults
      - KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE=false
      - KAFKA_CFG_NUM_PARTITIONS=3
      - KAFKA_CFG_OFFSETS_TOPIC_REPLICATION_FACTOR=1
      - KAFKA_CFG_TRANSACTION_STATE_LOG_REPLICATION_FACTOR=1
      - KAFKA_CFG_TRANSACTION_STATE_LOG_MIN_ISR=1

Run it

  • Save as docker-compose.yml
  • Start: docker compose up -d
  • Confirm the broker is up: docker logs -f kafka

This setup exposes Kafka on localhost:29092 for your host, while internal Docker traffic uses kafka:9092. That split avoids the classic “advertised.listeners” confusion on laptops.

Step 2 — Create a topic and produce a few keyed events

We’ll create a topic named payments with 3 partitions. Then we’ll publish a few events with a key so related events stick to the same partition (your future self will thank you when debugging ordering).

# Create a 3-partition topic (replication=1 for a single-broker demo)
docker exec -it kafka /opt/bitnami/kafka/bin/kafka-topics.sh \
  --bootstrap-server localhost:9092 \
  --create --topic payments --partitions 3 --replication-factor 1

# Describe it (check partitions)
docker exec -it kafka /opt/bitnami/kafka/bin/kafka-topics.sh \
  --bootstrap-server localhost:9092 \
  --describe --topic payments

# Produce keyed events: KEY|VALUE
# (parse.key=true means split key/value using key.separator)
docker exec -it kafka /opt/bitnami/kafka/bin/kafka-console-producer.sh \
  --bootstrap-server localhost:9092 \
  --topic payments \
  --property parse.key=true \
  --property key.separator="|"

While the producer is running, paste a few lines like these (same key means same partition):

  • order-1001|{"event":"payment_authorized","order_id":"1001","amount":42.50,"currency":"EUR"}
  • order-1002|{"event":"payment_authorized","order_id":"1002","amount":13.99,"currency":"EUR"}
  • order-1001|{"event":"payment_captured","order_id":"1001","amount":42.50,"currency":"EUR"}

Close the producer with Ctrl+C when you’re done.

Step 3 — Consume with a consumer group (and watch parallelism)

Now the payoff: start consumers with the same group id. Kafka will assign partitions across them. With 3 partitions, at most 3 consumers can actively read in parallel in a single group.

Terminal A (consumer 1)

  • Run a console consumer with group id demo-payments
  • Print partition + offset so you can see what’s happening
  • Read from the beginning for the first run

Terminal B (consumer 2)

  • Start the same command again (same group id)
  • Notice Kafka rebalances and reassigns partitions
  • Produce more events and watch which terminal prints them
import json
from confluent_kafka import Consumer, KafkaException

BOOTSTRAP = "localhost:29092"
TOPIC = "payments"
GROUP_ID = "demo-payments"

conf = {
    "bootstrap.servers": BOOTSTRAP,
    "group.id": GROUP_ID,
    "auto.offset.reset": "earliest",      # first run only; after commits it will resume
    "enable.auto.commit": True,           # start simple; see notes below for manual commits
}

c = Consumer(conf)
c.subscribe([TOPIC])

try:
    while True:
        msg = c.poll(1.0)
        if msg is None:
            continue
        if msg.error():
            raise KafkaException(msg.error())

        key = msg.key().decode("utf-8") if msg.key() else None
        value = msg.value().decode("utf-8") if msg.value() else None

        payload = None
        if value:
            try:
                payload = json.loads(value)
            except json.JSONDecodeError:
                payload = value

        print(
            f"topic={msg.topic()} partition={msg.partition()} offset={msg.offset()} key={key} value={payload}"
        )

except KeyboardInterrupt:
    pass
finally:
    c.close()
Auto-commit vs manual commit (practical guidance)

Auto-commit is fine for low-stakes pipelines and learning. In production, you often commit offsets after your processing succeeds (especially if you write to a database or call an external API). The key idea: committing an offset means “this group has processed up to here,” not “Kafka deleted the message.”

Step 4 — Check consumer lag (the fastest health signal)

When a pipeline “feels slow,” consumer lag usually tells you whether you’re keeping up. Lag is the gap between the end of the log and the group’s current committed offset.

A minimal operations checklist

  • Is the consumer running and polling regularly?
  • Is lag growing (consumer is falling behind)?
  • Are you blocked on downstream I/O (DB, API, storage)?
  • Do you have enough partitions for your desired parallelism?
  • Are rebalances happening too often (thrashing)?

Step 5 — Choose partitions and consumer groups deliberately

This is where Kafka design becomes “engineering” instead of “copy a tutorial.” Use partitions for parallelism and ordering boundaries; use consumer groups for scaling a specific workload.

You want… Use partitions like… Use consumer groups like…
More throughput Increase partitions (more lanes) Add consumers to the same group (more workers)
Strict ordering for an entity Key by entity id to keep it in one partition Single group processes it; one consumer handles that partition at a time
Multiple independent readers Same topic; partitions unchanged Use different groups (each group has its own offsets)
Replay/backfill Retention/compaction settings matter Use a new group id or reset offsets intentionally
A realistic rule of thumb

Your max parallelism for a single consumer group is bounded by the number of partitions. If you run 10 consumers on a topic with 3 partitions, 7 consumers will sit idle (they have nothing to own).

Common mistakes (and how to fix them)

Most Kafka confusion comes from a handful of predictable assumptions. Here are the ones that cause the most production pain, with fixes that are straightforward once you know what to look for.

Mistake 1 — Expecting global ordering across a topic

Kafka guarantees ordering within a partition, not across all partitions. If related events land in different partitions, their observed order can differ between consumers.

  • Fix: key related events (order_id, user_id) so they map to the same partition.
  • Fix: if you truly need global ordering, use one partition (and accept the throughput limit).

Mistake 2 — Random or unstable keys

If the key changes per event, partitioning becomes effectively random and you lose locality and ordering.

  • Fix: choose a stable key that matches your processing boundary (entity id).
  • Fix: avoid “timestamp as key” unless you intentionally want distribution, not ordering.

Mistake 3 — More consumers than partitions (and wondering why it’s slow)

Consumers don’t magically split messages. They split partitions. Extra consumers idle.

  • Fix: increase partitions if you need more parallelism.
  • Fix: scale within reason; too many partitions can increase overhead.

Mistake 4 — Confusing offsets with message deletion

Committing offsets does not remove events. Kafka retention controls deletion (time/size/compaction).

  • Fix: explain offsets as “a cursor the group saves.”
  • Fix: configure retention/compaction based on replay needs.

Mistake 5 — Auto-commit hides failures

If offsets commit before your processing finishes, a crash can skip work (you “acknowledged” too early).

  • Fix: for critical pipelines, commit offsets after processing succeeds.
  • Fix: design idempotent processing so replays are safe.

Mistake 6 — Rebalance thrash (consumers constantly rejoining)

If a consumer doesn’t poll in time (long processing, GC pauses, slow I/O), Kafka may consider it dead and rebalance.

  • Fix: keep processing time predictable; batch work or offload heavy tasks.
  • Fix: tune timeouts and poll intervals to match your workload.

Mistake 7 — Using Kafka as a blob store

Large messages reduce throughput, increase latency, and make retention expensive.

  • Fix: store large payloads elsewhere (object storage) and send references in Kafka.
  • Fix: compress when appropriate and keep message sizes sane.

Mistake 8 — No “reset strategy” for replays

Teams either never replay (fear), or replay accidentally (panic). Both are avoidable.

  • Fix: treat replays as a normal operation: new group id for backfills, explicit offset resets for corrections.
  • Fix: log dataset/pipeline versions so you can reproduce outcomes.

FAQ

Is Kafka a queue or a log?

Kafka is a distributed log. It can behave like a queue when a single consumer group processes a topic, but the underlying model is append + retain + replay, with consumption tracked via offsets.

What’s the difference between a topic and a partition?

A topic is the named stream; partitions are shards inside it. Partitions are what Kafka uses to scale storage and throughput. Ordering guarantees apply within a partition, not across the entire topic.

How many partitions do I need?

Start with the parallelism you need and grow intentionally. A consumer group can process up to one partition per consumer at a time, so partitions set your max parallelism. Don’t over-partition early; too many partitions can increase overhead (more files, more coordination).

Why are my events “out of order”?

Because ordering is per partition. If related events use different keys (or no key), they may land in different partitions and be consumed in different orders. Fix it by choosing a stable key (like order_id) for sequences that must stay ordered.

Can two different consumer groups read the same topic?

Yes—and that’s a feature. Each consumer group maintains its own offsets, so multiple teams/systems can independently process the same event stream without interfering with each other.

What does committing an offset actually do?

It records the group’s progress. Committing an offset means “we have processed up to this point in this partition.” It does not delete messages. Deletion is controlled by retention and compaction settings.

Do I still need ZooKeeper?

Newer Kafka deployments use KRaft (no ZooKeeper) for metadata management. Some existing clusters still run with ZooKeeper, so you may see both in the wild, but the “modern default” is moving toward ZooKeeper-free setups.

Cheatsheet

Use this as a “quick recall” when building or reviewing a Kafka pipeline. It’s intentionally short and practical.

Design checklist

  • Pick an event key that matches your ordering boundary (user_id/order_id)
  • Version your event schema (and keep changes additive)
  • Decide the failure model: at-least-once + idempotent processing is a great default
  • Choose partitions based on needed parallelism (and future growth)
  • Decide retention/compaction based on replay requirements

Operations checklist

  • Check consumer lag first when “things are slow”
  • Watch for frequent rebalances (a stability smell)
  • Ensure consumers poll regularly (don’t block forever)
  • Monitor error rates and dead-letter strategies for poison messages
  • Test replay/backfill steps before you need them
Rule Meaning Typical implication
Ordering is per partition Only events in the same partition preserve order Use stable keys for sequences that must be ordered
Parallelism is per partition One partition is owned by one consumer in a group at a time More consumers than partitions won’t speed up processing
Offsets are the cursor Consumers track progress by committing offsets Replay is normal: change group id or reset offsets intentionally
If you remember only three words

Keys. Partitions. Offsets. Those three explain most Kafka “mysteries.”

Wrap-up

Kafka is easier than it looks once you stop thinking “messages” and start thinking “append-only logs with coordinated readers.” Events live in topics, topics scale via partitions, and consumer groups let you scale processing without duplicating work. If you can answer “what’s my key?” and “how many partitions do I need?” you’re already ahead of most first deployments.

Next actions (pick one)

  • Run the local demo again and try different keys to see partition behavior
  • Add a third consumer and observe how 3 partitions max out parallelism
  • Sketch your first real event: name, key, schema version, and what “done” processing means
  • Write down your replay plan: new group id vs explicit offset reset

When you’re ready to go deeper, the related posts below are great follow-ups: query performance, explain plans, modeling, and modern analytics architecture decisions.

Quiz

Quick self-check (demo). This quiz is auto-generated for data / engineering / databases.

1) Where does Kafka guarantee message ordering?
2) A topic has 3 partitions. What is the maximum number of consumers in one consumer group that can actively read in parallel?
3) What most commonly determines which partition an event goes to?
4) What does committing an offset mean for a consumer group?