MQTT Explained: The Message Bus Behind Many IoT Systems

By Samuel Labant Published Jan 9, 2026 Updated Jan 9, 2026

MQTT is the quiet workhorse behind a lot of “it just works” IoT: sensors streaming telemetry, devices receiving commands, dashboards updating in real time, and fleets surviving flaky Wi-Fi. This guide explains how MQTT actually behaves (topics, QoS, retained messages, sessions), plus the practical design patterns that keep your system reliable when devices disconnect, reboot, or roam between networks.

Quickstart

If you want the fastest path from “MQTT is a mystery” to “I can build with it,” do these in order. The goal is not to memorize the spec—it’s to internalize the handful of behaviors that make or break real deployments.

1) Start with one namespace for topics

A clean topic tree prevents chaos later. Think: product → environment → device → signal.

Pick a stable root (e.g., acme or home)
Use IDs, not user-facing names (device/9f3a… beats device/kitchen)
Separate state from events (…/state vs …/events)

2) Default to QoS 0, upgrade intentionally

QoS changes delivery guarantees, latency, and broker load. Use the lightest guarantee that’s safe.

Telemetry (frequent): QoS 0
Commands (must arrive): QoS 1
Exactly-once is rare: avoid QoS 2 unless you truly need it

3) Use retained messages for “latest state”

Retained messages make new subscribers immediately get the last known value (great for UI and device status).

Retain state topics: online/offline, current mode, last reading
Do not retain transient events (alarms, clicks, button presses)
Clear stale state with an empty retained payload when needed

4) Add an “I died” signal (LWT)

Last Will and Testament (LWT) lets the broker publish a message when a client disappears unexpectedly.

Set LWT topic to …/status with payload offline (retained)
On clean connect, publish online (retained)
Use a sensible keepalive so dead clients are detected

One rule that saves weeks

Decide early which topics represent state (retained, latest value matters) and which represent events (not retained, history belongs elsewhere). Most “MQTT weirdness” comes from mixing these.

Overview

MQTT is a lightweight publish/subscribe messaging protocol built around a central broker. Clients (devices, services, dashboards) connect to the broker and either publish messages to a topic, or subscribe to topics and receive messages as they arrive. This decoupling is why MQTT is often described as a “message bus” for IoT systems.

What this post covers

How topics work: namespaces, wildcards, and subscription patterns
QoS 0/1/2: what you really get (and what you don’t)
Retained messages: building “latest state” without a database
Sessions + reconnects: surviving flaky networks
Practical patterns: device state, commands, telemetry, and fleet monitoring

If you’re coming from HTTP: MQTT isn’t request/response. It’s streaming + fanout. One sensor publish can update many subscribers (storage service, dashboard, alerting, analytics) without the sensor knowing they exist. That’s what makes MQTT a great fit for constrained devices, intermittent connectivity, and systems that evolve over time.

Mental model

Think of the broker as a post office: devices drop mail into labeled boxes (topics), and anyone who subscribed to those boxes gets copies delivered.

Core concepts

You can use MQTT successfully with just a few core ideas. The details matter—but only a few details are “high leverage.” This section focuses on the behaviors that affect reliability, latency, and operational sanity.

Broker, client, and topics

Broker

The central server that accepts client connections, authenticates them, applies ACLs, and routes messages. It also stores retained messages and (optionally) session state.

Client

Anything that connects to the broker: sensors, gateways, mobile apps, backend services. Each client has an ID, connection options (keepalive, clean start), and can publish/subscribe.

Topics are a tree (not a queue)

Topics are hierarchical strings like acme/site-1/device-42/telemetry/temp. They’re not created up front. The broker routes based on string matching.

Pattern	Meaning	Typical use
`acme/+/+/telemetry/#`	`+` matches one level; `#` matches the rest	Fleet-wide telemetry collection
`acme/site-1/device-42/cmd`	Exact topic (no wildcard)	Targeted device commands
`acme/site-1/+/status`	All device status topics in a site	Dashboards, monitoring, alerts

QoS: delivery guarantees (and what “guarantee” means)

QoS is not “network reliability.” It’s a contract between client and broker about delivery semantics. The network can still drop connections, and your application still needs idempotency for commands and state updates.

QoS	Name	Guarantee	Cost / tradeoff	Good for
0	At most once	Best effort. No retry.	Fastest, lowest overhead	High-rate telemetry, non-critical updates
1	At least once	Will be delivered, but duplicates possible	Retry + acknowledgements	Commands, important events (with dedupe)
2	Exactly once	Single delivery via a longer handshake	Highest overhead, more latency	Rare: financial/transaction-like cases

QoS 1 needs idempotency

“At least once” means duplicates can happen—especially across reconnects. If you publish commands at QoS 1, design them so re-processing the same command is safe (use command IDs, timestamps, or desired-state patterns).

Retained messages: “latest value on subscribe”

A retained message is stored by the broker per topic. When a new subscriber subscribes, the broker immediately sends the retained message (if any) before new live messages arrive. This is perfect for state: “What’s the device mode right now?” or “Is it online?”

Use retained for

Device online/offline status
Last telemetry reading (when “latest is enough”)
Configuration or desired state
UI dashboards that open and need instant values

Avoid retained for

Events that should not replay (button press, motion detected)
High-frequency streams (retained becomes misleading “last only”)
Anything where old data is dangerous without timestamps

Sessions, keepalive, and disconnect behavior

Devices disconnect. Wi-Fi drops. LTE changes IPs. MQTT is designed for this reality—but you must configure it. The key knobs are:

Keepalive: heartbeat interval to detect dead connections
Clean start / session persistence: whether subscriptions and queued QoS 1/2 messages survive reconnect
Will message (LWT): broker-published “offline” if the client dies unexpectedly

The common pattern for IoT fleets is: persistent session (so commands aren’t lost while offline), LWT for presence, and “desired state” topics so devices converge after reboot.

Step-by-step

Let’s build a tiny but realistic MQTT system: a broker, a subscriber, and a “device” publisher. You can run this locally, then reuse the same architecture when you move to a cloud broker or a gateway on a LAN.

Step 1 — Decide a topic layout (before you write code)

Pick a topic taxonomy that stays readable at scale. Here’s a solid default:

A practical topic convention

Purpose	Topic shape	Retained?	Notes
Telemetry	`{root}/{site}/{device}/telemetry/{signal}`	No (usually)	Add timestamps in payload
Status / presence	`{root}/{site}/{device}/status`	Yes	Use LWT to set `offline`
Commands	`{root}/{site}/{device}/cmd`	No	Use QoS 1 + IDs for dedupe
Desired state	`{root}/{site}/{device}/desired`	Yes	Device “converges” to this state
Reported state	`{root}/{site}/{device}/reported`	Yes	Device confirms what it’s doing

State beats imperative commands

Instead of sending “turn on now” repeatedly, publish a desired state (retained). If a device is offline, it will still see the latest desired state on reconnect and converge automatically.

Step 2 — Run a broker locally (Docker)

For local development, a broker like Mosquitto is the quickest way to experiment. The snippet below runs Mosquitto and exposes the standard MQTT port.

version: "3.8"
services:
  mqtt:
    image: eclipse-mosquitto:2
    container_name: mqtt
    ports:
      - "1883:1883"
    volumes:
      - ./mosquitto.conf:/mosquitto/config/mosquitto.conf:ro
      - mosq_data:/mosquitto/data
      - mosq_log:/mosquitto/log

volumes:
  mosq_data:
  mosq_log:

About the config file

Keep your local broker permissive while learning, then lock it down for real deployments (auth + ACL + TLS). A minimal mosquitto.conf typically sets listeners and persistence. In production, you’ll also define users and access rules.

Step 3 — Prove routing with a subscriber and a publisher

Use CLI tools to validate topic patterns before you build firmware or backend services. Subscribe first (so you can see messages), then publish a test message.

# 1) Subscribe to everything under the root (wildcards)
mosquitto_sub -h localhost -p 1883 -v -t 'acme/site-1/#'

# 2) Publish telemetry (QoS 0)
mosquitto_pub -h localhost -p 1883 -t 'acme/site-1/device-42/telemetry/temp' -q 0 -m '{"v":21.7,"ts":1736430000}'

# 3) Publish a retained status message (so new subscribers get it immediately)
mosquitto_pub -h localhost -p 1883 -t 'acme/site-1/device-42/status' -r -q 1 -m 'online'

At this point you’ve validated the basic “message bus” flow. Now we’ll simulate a device that: (1) publishes telemetry periodically, (2) sets online/offline presence correctly, and (3) can be extended to react to commands.

Step 4 — Implement a tiny device client (with LWT + retained presence)

A good MQTT device client should handle reconnects, publish presence, and keep payloads consistent. The following Python example uses a single topic namespace and includes an LWT so the broker marks the device offline if it dies.

import json
import random
import time
import paho.mqtt.client as mqtt

BROKER_HOST = "localhost"
BROKER_PORT = 1883

ROOT = "acme/site-1/device-42"
TOPIC_STATUS = f"{ROOT}/status"
TOPIC_TELEMETRY = f"{ROOT}/telemetry/temp"

CLIENT_ID = "device-42-sim"
KEEPALIVE = 30

def on_connect(client, userdata, flags, rc):
    # Publish "online" as retained so dashboards instantly see current status.
    client.publish(TOPIC_STATUS, payload="online", qos=1, retain=True)

def on_disconnect(client, userdata, rc):
    # rc != 0 usually means unexpected disconnect; LWT will handle offline in that case.
    pass

client = mqtt.Client(client_id=CLIENT_ID, clean_session=True)

# LWT: if the client disappears without a clean disconnect, broker publishes "offline" retained.
client.will_set(TOPIC_STATUS, payload="offline", qos=1, retain=True)

client.on_connect = on_connect
client.on_disconnect = on_disconnect

client.connect(BROKER_HOST, BROKER_PORT, keepalive=KEEPALIVE)
client.loop_start()

try:
    while True:
        payload = {
            "v": round(18.0 + random.random() * 8.0, 1),
            "ts": int(time.time()),
            "unit": "C",
            "schema": "temp.v1"
        }
        # QoS 0 is fine for frequent telemetry; include timestamp so consumers can detect staleness.
        client.publish(TOPIC_TELEMETRY, payload=json.dumps(payload), qos=0, retain=False)
        time.sleep(2.0)
except KeyboardInterrupt:
    # Clean shutdown: publish offline explicitly and disconnect.
    client.publish(TOPIC_STATUS, payload="offline", qos=1, retain=True)
    client.loop_stop()
    client.disconnect()

Gotchas that bite early

Retain + no timestamp: dashboards can show “fresh-looking” but stale values
QoS 1 duplicates: consumers should handle duplicate command/event IDs
Client IDs: two clients with the same ID will kick each other off
Wildcards: subscribing to # in production can accidentally leak data across tenants

Step 5 — Design payloads that won’t hurt you later

MQTT doesn’t enforce a payload format. That freedom is great—until you have multiple producers and consumers. Treat your payload as an API:

Payload hygiene (minimum)

Include timestamps (ts) for anything time-sensitive
Add a schema/version field (e.g., schema: temp.v1)
Keep keys stable; avoid renaming without versioning
Use explicit units (C, %, Pa)

When to avoid JSON

Ultra-low bandwidth links (consider binary formats)
Strict latency or CPU constraints
High-rate telemetry streams (batching may be better)
Very large messages (MQTT is not ideal for big blobs)

Step 6 — Security and operations (the “real system” checklist)

MQTT is easy to prototype—and equally easy to accidentally expose. Before any internet-facing deployment, treat these as non-negotiable:

Authentication: per-device credentials (not one shared password)
Authorization: ACLs per topic (devices can only publish/subscribe where allowed)
Transport security: TLS where applicable (especially off-LAN)
Observability: monitor client count, connect/disconnect rates, dropped messages, broker CPU/memory
Backpressure: define max inflight/queued messages to avoid slow consumers taking down the broker

A pragmatic architecture

Many systems use a local broker on the LAN (fast, resilient), then bridge selected topics to a cloud broker for remote access. This keeps local automation working even when the internet is down.

Common mistakes

MQTT is simple, which means the mistakes are usually about design choices rather than syntax. Here are the patterns that create “it works in the lab but not in the field.”

Mistake 1 — Treating MQTT like HTTP

MQTT is not request/response by default. If you need replies, define explicit reply topics.

Fix: model flows as state + events (publish and subscribe), not “call and wait.”
Fix: for commands, prefer desired/reported state where possible.

Mistake 2 — Topic sprawl without a namespace plan

If every team invents topics, subscriptions become fragile and security becomes impossible.

Fix: establish one root and a small set of patterns (telemetry/status/cmd).
Fix: document topics like an API and version payloads.

Mistake 3 — Using retained messages for transient events

New subscribers replay the last retained value, which can look like a “new” event.

Fix: retain only state; never retain one-off events.
Fix: include timestamps and interpret state as “last known.”

Mistake 4 — QoS 1/2 without dedupe or idempotency

Duplicates are a normal outcome of retries and reconnects.

Fix: add message IDs for commands/events; ignore repeats.
Fix: design handlers to be safe if run twice.

Mistake 5 — No presence strategy (no LWT)

Without presence, “is the device alive?” becomes guesswork.

Fix: LWT to publish offline (retained) + publish online on connect.
Fix: tune keepalive so dead connections are detected quickly enough.

Mistake 6 — Weak ACLs + broad wildcards

A subscription to # is convenient during dev and dangerous in production.

Fix: restrict publish/subscribe per device and per tenant.
Fix: separate environments (dev/stage/prod) by root topic or broker.

Debugging habit that pays

When something feels “random,” check these first: client ID collisions, retained messages replaying old state, and QoS 1 duplicates after reconnect.

FAQ

What is MQTT used for?

MQTT is used for lightweight messaging where many producers and consumers need to exchange data reliably, especially with constrained devices or unstable networks. Common examples: sensor telemetry, device commands, presence/status, home automation, industrial monitoring, and dashboards that update in real time.

What’s the difference between MQTT QoS 0, 1, and 2?

QoS 0 is best-effort (fastest). QoS 1 retries until acknowledged (delivery is “at least once,” so duplicates can occur). QoS 2 uses a longer handshake for “exactly once” delivery (highest overhead). In practice, most IoT systems use QoS 0 for telemetry and QoS 1 for commands/status that must arrive.

When should I use retained messages?

Use retained messages for state where the “latest value” matters to new subscribers: online/offline status, current mode, last reading, desired configuration. Avoid retaining transient events (alarms, button presses) because new subscribers will see the last retained value and may misinterpret it as a new event.

How do I model commands in MQTT?

For simple systems, publish commands to a per-device topic (often QoS 1) and include an ID so duplicates are safe. For more robust systems, publish a retained desired state and let devices report a retained reported state. Desired/reported state is resilient to offline devices and reboots.

Does MQTT work in browsers?

Many brokers support MQTT over WebSockets, which makes browser clients possible. The mental model remains the same (publish/subscribe), but you’ll typically terminate WebSockets at the broker and keep standard MQTT for devices.

How do I secure MQTT?

Use per-client authentication, strict topic-based ACLs, and TLS when traffic crosses untrusted networks. Also separate environments and avoid broad wildcard access in production. Security is less about the protocol and more about who is allowed to publish/subscribe to which topics.

Cheatsheet

Keep this section bookmarked. It’s the fast scan for “what should I choose?” and “what’s the safe default?”

QoS decision (fast)

QoS 0: high-rate telemetry where occasional drops are OK
QoS 1: commands and important updates (handle duplicates)
QoS 2: rare; only if you truly need exactly-once semantics

Retained decision (fast)

Retain: state, presence, desired config, “last known value”
Don’t retain: transient events, click-like signals, alarms without timestamps
Always include: timestamps for values that can go stale

Topic patterns you’ll reuse

Pattern	Why it exists	Tip
`{root}/{site}/{device}/telemetry/{signal}`	Clear fanout for metrics and storage	Use QoS 0 + include `ts`
`{root}/{site}/{device}/status`	Presence and health monitoring	Retained + LWT
`{root}/{site}/{device}/cmd`	Targeted device actions	QoS 1 + idempotent handlers
`{root}/{site}/{device}/desired`	Resilient control even when offline	Retained desired state
`{root}/{site}/{device}/reported`	Device confirms actual state	Retained confirmation

Production minimum checklist

Per-device credentials (no shared secrets)
ACLs per topic (least privilege)
LWT + retained presence
Payload versioning and timestamps
Monitoring: connect churn, message rates, broker CPU/memory

Wrap-up

MQTT is popular in IoT for a reason: it’s simple, efficient, and built for unreliable networks. The trick is to use the protocol’s “superpowers” on purpose—topics as a namespace, QoS as a tradeoff, retained messages for state, and LWT + sessions for real-world connectivity.

If you take one practical pattern from this post, take this: model device control as desired state + reported state, and use retained messages so devices recover from offline periods automatically. That single decision makes fleets far easier to operate.

Next steps

Write down your topic tree and mark which topics are state vs events
Add LWT presence to every device client
Pick QoS intentionally: telemetry (0), commands (1), avoid (2)
Lock down security: auth + ACL before anything goes public

Want to go deeper? The related posts below pair well with MQTT when you’re building real devices and deploying safely.

UniLab Editorial

Modern learning notes for practical builders.

MQTT Explained: The Message Bus Behind Many IoT Systems

Quickstart

1) Start with one namespace for topics

2) Default to QoS 0, upgrade intentionally

3) Use retained messages for “latest state”

4) Add an “I died” signal (LWT)

Overview

What this post covers

Core concepts

Broker, client, and topics

Broker

Client

Topics are a tree (not a queue)

QoS: delivery guarantees (and what “guarantee” means)

Retained messages: “latest value on subscribe”

Use retained for

Avoid retained for

Sessions, keepalive, and disconnect behavior

Step-by-step

Step 1 — Decide a topic layout (before you write code)

A practical topic convention

Step 2 — Run a broker locally (Docker)

Step 3 — Prove routing with a subscriber and a publisher

Step 4 — Implement a tiny device client (with LWT + retained presence)

Step 5 — Design payloads that won’t hurt you later

Payload hygiene (minimum)

When to avoid JSON

Step 6 — Security and operations (the “real system” checklist)

Common mistakes

Mistake 1 — Treating MQTT like HTTP

Mistake 2 — Topic sprawl without a namespace plan

Mistake 3 — Using retained messages for transient events

Mistake 4 — QoS 1/2 without dedupe or idempotency

Mistake 5 — No presence strategy (no LWT)

Mistake 6 — Weak ACLs + broad wildcards

FAQ

What is MQTT used for?

What’s the difference between MQTT QoS 0, 1, and 2?

When should I use retained messages?

How do I model commands in MQTT?

Does MQTT work in browsers?

How do I secure MQTT?

Cheatsheet

QoS decision (fast)

Retained decision (fast)

Topic patterns you’ll reuse

Wrap-up

Quiz

Related posts