MQTT is the quiet workhorse behind a lot of “it just works” IoT: sensors streaming telemetry, devices receiving commands, dashboards updating in real time, and fleets surviving flaky Wi-Fi. This guide explains how MQTT actually behaves (topics, QoS, retained messages, sessions), plus the practical design patterns that keep your system reliable when devices disconnect, reboot, or roam between networks.
Quickstart
If you want the fastest path from “MQTT is a mystery” to “I can build with it,” do these in order. The goal is not to memorize the spec—it’s to internalize the handful of behaviors that make or break real deployments.
1) Start with one namespace for topics
A clean topic tree prevents chaos later. Think: product → environment → device → signal.
- Pick a stable root (e.g.,
acmeorhome) - Use IDs, not user-facing names (
device/9f3a…beatsdevice/kitchen) - Separate state from events (
…/statevs…/events)
2) Default to QoS 0, upgrade intentionally
QoS changes delivery guarantees, latency, and broker load. Use the lightest guarantee that’s safe.
- Telemetry (frequent): QoS 0
- Commands (must arrive): QoS 1
- Exactly-once is rare: avoid QoS 2 unless you truly need it
3) Use retained messages for “latest state”
Retained messages make new subscribers immediately get the last known value (great for UI and device status).
- Retain state topics: online/offline, current mode, last reading
- Do not retain transient events (alarms, clicks, button presses)
- Clear stale state with an empty retained payload when needed
4) Add an “I died” signal (LWT)
Last Will and Testament (LWT) lets the broker publish a message when a client disappears unexpectedly.
- Set LWT topic to
…/statuswith payloadoffline(retained) - On clean connect, publish
online(retained) - Use a sensible keepalive so dead clients are detected
Decide early which topics represent state (retained, latest value matters) and which represent events (not retained, history belongs elsewhere). Most “MQTT weirdness” comes from mixing these.
Overview
MQTT is a lightweight publish/subscribe messaging protocol built around a central broker. Clients (devices, services, dashboards) connect to the broker and either publish messages to a topic, or subscribe to topics and receive messages as they arrive. This decoupling is why MQTT is often described as a “message bus” for IoT systems.
What this post covers
- How topics work: namespaces, wildcards, and subscription patterns
- QoS 0/1/2: what you really get (and what you don’t)
- Retained messages: building “latest state” without a database
- Sessions + reconnects: surviving flaky networks
- Practical patterns: device state, commands, telemetry, and fleet monitoring
If you’re coming from HTTP: MQTT isn’t request/response. It’s streaming + fanout. One sensor publish can update many subscribers (storage service, dashboard, alerting, analytics) without the sensor knowing they exist. That’s what makes MQTT a great fit for constrained devices, intermittent connectivity, and systems that evolve over time.
Think of the broker as a post office: devices drop mail into labeled boxes (topics), and anyone who subscribed to those boxes gets copies delivered.
Core concepts
You can use MQTT successfully with just a few core ideas. The details matter—but only a few details are “high leverage.” This section focuses on the behaviors that affect reliability, latency, and operational sanity.
Broker, client, and topics
Broker
The central server that accepts client connections, authenticates them, applies ACLs, and routes messages. It also stores retained messages and (optionally) session state.
Client
Anything that connects to the broker: sensors, gateways, mobile apps, backend services. Each client has an ID, connection options (keepalive, clean start), and can publish/subscribe.
Topics are a tree (not a queue)
Topics are hierarchical strings like acme/site-1/device-42/telemetry/temp. They’re not created up front.
The broker routes based on string matching.
| Pattern | Meaning | Typical use |
|---|---|---|
acme/+/+/telemetry/# |
+ matches one level; # matches the rest |
Fleet-wide telemetry collection |
acme/site-1/device-42/cmd |
Exact topic (no wildcard) | Targeted device commands |
acme/site-1/+/status |
All device status topics in a site | Dashboards, monitoring, alerts |
QoS: delivery guarantees (and what “guarantee” means)
QoS is not “network reliability.” It’s a contract between client and broker about delivery semantics. The network can still drop connections, and your application still needs idempotency for commands and state updates.
| QoS | Name | Guarantee | Cost / tradeoff | Good for |
|---|---|---|---|---|
| 0 | At most once | Best effort. No retry. | Fastest, lowest overhead | High-rate telemetry, non-critical updates |
| 1 | At least once | Will be delivered, but duplicates possible | Retry + acknowledgements | Commands, important events (with dedupe) |
| 2 | Exactly once | Single delivery via a longer handshake | Highest overhead, more latency | Rare: financial/transaction-like cases |
“At least once” means duplicates can happen—especially across reconnects. If you publish commands at QoS 1, design them so re-processing the same command is safe (use command IDs, timestamps, or desired-state patterns).
Retained messages: “latest value on subscribe”
A retained message is stored by the broker per topic. When a new subscriber subscribes, the broker immediately sends the retained message (if any) before new live messages arrive. This is perfect for state: “What’s the device mode right now?” or “Is it online?”
Use retained for
- Device online/offline status
- Last telemetry reading (when “latest is enough”)
- Configuration or desired state
- UI dashboards that open and need instant values
Avoid retained for
- Events that should not replay (button press, motion detected)
- High-frequency streams (retained becomes misleading “last only”)
- Anything where old data is dangerous without timestamps
Sessions, keepalive, and disconnect behavior
Devices disconnect. Wi-Fi drops. LTE changes IPs. MQTT is designed for this reality—but you must configure it. The key knobs are:
- Keepalive: heartbeat interval to detect dead connections
- Clean start / session persistence: whether subscriptions and queued QoS 1/2 messages survive reconnect
- Will message (LWT): broker-published “offline” if the client dies unexpectedly
The common pattern for IoT fleets is: persistent session (so commands aren’t lost while offline), LWT for presence, and “desired state” topics so devices converge after reboot.
Step-by-step
Let’s build a tiny but realistic MQTT system: a broker, a subscriber, and a “device” publisher. You can run this locally, then reuse the same architecture when you move to a cloud broker or a gateway on a LAN.
Step 1 — Decide a topic layout (before you write code)
Pick a topic taxonomy that stays readable at scale. Here’s a solid default:
A practical topic convention
| Purpose | Topic shape | Retained? | Notes |
|---|---|---|---|
| Telemetry | {root}/{site}/{device}/telemetry/{signal} |
No (usually) | Add timestamps in payload |
| Status / presence | {root}/{site}/{device}/status |
Yes | Use LWT to set offline |
| Commands | {root}/{site}/{device}/cmd |
No | Use QoS 1 + IDs for dedupe |
| Desired state | {root}/{site}/{device}/desired |
Yes | Device “converges” to this state |
| Reported state | {root}/{site}/{device}/reported |
Yes | Device confirms what it’s doing |
Instead of sending “turn on now” repeatedly, publish a desired state (retained). If a device is offline, it will still see the latest desired state on reconnect and converge automatically.
Step 2 — Run a broker locally (Docker)
For local development, a broker like Mosquitto is the quickest way to experiment. The snippet below runs Mosquitto and exposes the standard MQTT port.
version: "3.8"
services:
mqtt:
image: eclipse-mosquitto:2
container_name: mqtt
ports:
- "1883:1883"
volumes:
- ./mosquitto.conf:/mosquitto/config/mosquitto.conf:ro
- mosq_data:/mosquitto/data
- mosq_log:/mosquitto/log
volumes:
mosq_data:
mosq_log:
Keep your local broker permissive while learning, then lock it down for real deployments (auth + ACL + TLS).
A minimal mosquitto.conf typically sets listeners and persistence. In production, you’ll also define users and access rules.
Step 3 — Prove routing with a subscriber and a publisher
Use CLI tools to validate topic patterns before you build firmware or backend services. Subscribe first (so you can see messages), then publish a test message.
# 1) Subscribe to everything under the root (wildcards)
mosquitto_sub -h localhost -p 1883 -v -t 'acme/site-1/#'
# 2) Publish telemetry (QoS 0)
mosquitto_pub -h localhost -p 1883 -t 'acme/site-1/device-42/telemetry/temp' -q 0 -m '{"v":21.7,"ts":1736430000}'
# 3) Publish a retained status message (so new subscribers get it immediately)
mosquitto_pub -h localhost -p 1883 -t 'acme/site-1/device-42/status' -r -q 1 -m 'online'
At this point you’ve validated the basic “message bus” flow. Now we’ll simulate a device that: (1) publishes telemetry periodically, (2) sets online/offline presence correctly, and (3) can be extended to react to commands.
Step 4 — Implement a tiny device client (with LWT + retained presence)
A good MQTT device client should handle reconnects, publish presence, and keep payloads consistent. The following Python example uses a single topic namespace and includes an LWT so the broker marks the device offline if it dies.
import json
import random
import time
import paho.mqtt.client as mqtt
BROKER_HOST = "localhost"
BROKER_PORT = 1883
ROOT = "acme/site-1/device-42"
TOPIC_STATUS = f"{ROOT}/status"
TOPIC_TELEMETRY = f"{ROOT}/telemetry/temp"
CLIENT_ID = "device-42-sim"
KEEPALIVE = 30
def on_connect(client, userdata, flags, rc):
# Publish "online" as retained so dashboards instantly see current status.
client.publish(TOPIC_STATUS, payload="online", qos=1, retain=True)
def on_disconnect(client, userdata, rc):
# rc != 0 usually means unexpected disconnect; LWT will handle offline in that case.
pass
client = mqtt.Client(client_id=CLIENT_ID, clean_session=True)
# LWT: if the client disappears without a clean disconnect, broker publishes "offline" retained.
client.will_set(TOPIC_STATUS, payload="offline", qos=1, retain=True)
client.on_connect = on_connect
client.on_disconnect = on_disconnect
client.connect(BROKER_HOST, BROKER_PORT, keepalive=KEEPALIVE)
client.loop_start()
try:
while True:
payload = {
"v": round(18.0 + random.random() * 8.0, 1),
"ts": int(time.time()),
"unit": "C",
"schema": "temp.v1"
}
# QoS 0 is fine for frequent telemetry; include timestamp so consumers can detect staleness.
client.publish(TOPIC_TELEMETRY, payload=json.dumps(payload), qos=0, retain=False)
time.sleep(2.0)
except KeyboardInterrupt:
# Clean shutdown: publish offline explicitly and disconnect.
client.publish(TOPIC_STATUS, payload="offline", qos=1, retain=True)
client.loop_stop()
client.disconnect()
- Retain + no timestamp: dashboards can show “fresh-looking” but stale values
- QoS 1 duplicates: consumers should handle duplicate command/event IDs
- Client IDs: two clients with the same ID will kick each other off
- Wildcards: subscribing to
#in production can accidentally leak data across tenants
Step 5 — Design payloads that won’t hurt you later
MQTT doesn’t enforce a payload format. That freedom is great—until you have multiple producers and consumers. Treat your payload as an API:
Payload hygiene (minimum)
- Include timestamps (
ts) for anything time-sensitive - Add a schema/version field (e.g.,
schema: temp.v1) - Keep keys stable; avoid renaming without versioning
- Use explicit units (
C,%,Pa)
When to avoid JSON
- Ultra-low bandwidth links (consider binary formats)
- Strict latency or CPU constraints
- High-rate telemetry streams (batching may be better)
- Very large messages (MQTT is not ideal for big blobs)
Step 6 — Security and operations (the “real system” checklist)
MQTT is easy to prototype—and equally easy to accidentally expose. Before any internet-facing deployment, treat these as non-negotiable:
- Authentication: per-device credentials (not one shared password)
- Authorization: ACLs per topic (devices can only publish/subscribe where allowed)
- Transport security: TLS where applicable (especially off-LAN)
- Observability: monitor client count, connect/disconnect rates, dropped messages, broker CPU/memory
- Backpressure: define max inflight/queued messages to avoid slow consumers taking down the broker
Many systems use a local broker on the LAN (fast, resilient), then bridge selected topics to a cloud broker for remote access. This keeps local automation working even when the internet is down.
Common mistakes
MQTT is simple, which means the mistakes are usually about design choices rather than syntax. Here are the patterns that create “it works in the lab but not in the field.”
Mistake 1 — Treating MQTT like HTTP
MQTT is not request/response by default. If you need replies, define explicit reply topics.
- Fix: model flows as state + events (publish and subscribe), not “call and wait.”
- Fix: for commands, prefer desired/reported state where possible.
Mistake 2 — Topic sprawl without a namespace plan
If every team invents topics, subscriptions become fragile and security becomes impossible.
- Fix: establish one root and a small set of patterns (telemetry/status/cmd).
- Fix: document topics like an API and version payloads.
Mistake 3 — Using retained messages for transient events
New subscribers replay the last retained value, which can look like a “new” event.
- Fix: retain only state; never retain one-off events.
- Fix: include timestamps and interpret state as “last known.”
Mistake 4 — QoS 1/2 without dedupe or idempotency
Duplicates are a normal outcome of retries and reconnects.
- Fix: add message IDs for commands/events; ignore repeats.
- Fix: design handlers to be safe if run twice.
Mistake 5 — No presence strategy (no LWT)
Without presence, “is the device alive?” becomes guesswork.
- Fix: LWT to publish
offline(retained) + publishonlineon connect. - Fix: tune keepalive so dead connections are detected quickly enough.
Mistake 6 — Weak ACLs + broad wildcards
A subscription to # is convenient during dev and dangerous in production.
- Fix: restrict publish/subscribe per device and per tenant.
- Fix: separate environments (dev/stage/prod) by root topic or broker.
When something feels “random,” check these first: client ID collisions, retained messages replaying old state, and QoS 1 duplicates after reconnect.
FAQ
What is MQTT used for?
MQTT is used for lightweight messaging where many producers and consumers need to exchange data reliably, especially with constrained devices or unstable networks. Common examples: sensor telemetry, device commands, presence/status, home automation, industrial monitoring, and dashboards that update in real time.
What’s the difference between MQTT QoS 0, 1, and 2?
QoS 0 is best-effort (fastest). QoS 1 retries until acknowledged (delivery is “at least once,” so duplicates can occur). QoS 2 uses a longer handshake for “exactly once” delivery (highest overhead). In practice, most IoT systems use QoS 0 for telemetry and QoS 1 for commands/status that must arrive.
When should I use retained messages?
Use retained messages for state where the “latest value” matters to new subscribers: online/offline status, current mode, last reading, desired configuration. Avoid retaining transient events (alarms, button presses) because new subscribers will see the last retained value and may misinterpret it as a new event.
How do I model commands in MQTT?
For simple systems, publish commands to a per-device topic (often QoS 1) and include an ID so duplicates are safe. For more robust systems, publish a retained desired state and let devices report a retained reported state. Desired/reported state is resilient to offline devices and reboots.
Does MQTT work in browsers?
Many brokers support MQTT over WebSockets, which makes browser clients possible. The mental model remains the same (publish/subscribe), but you’ll typically terminate WebSockets at the broker and keep standard MQTT for devices.
How do I secure MQTT?
Use per-client authentication, strict topic-based ACLs, and TLS when traffic crosses untrusted networks. Also separate environments and avoid broad wildcard access in production. Security is less about the protocol and more about who is allowed to publish/subscribe to which topics.
Cheatsheet
Keep this section bookmarked. It’s the fast scan for “what should I choose?” and “what’s the safe default?”
QoS decision (fast)
- QoS 0: high-rate telemetry where occasional drops are OK
- QoS 1: commands and important updates (handle duplicates)
- QoS 2: rare; only if you truly need exactly-once semantics
Retained decision (fast)
- Retain: state, presence, desired config, “last known value”
- Don’t retain: transient events, click-like signals, alarms without timestamps
- Always include: timestamps for values that can go stale
Topic patterns you’ll reuse
| Pattern | Why it exists | Tip |
|---|---|---|
{root}/{site}/{device}/telemetry/{signal} |
Clear fanout for metrics and storage | Use QoS 0 + include ts |
{root}/{site}/{device}/status |
Presence and health monitoring | Retained + LWT |
{root}/{site}/{device}/cmd |
Targeted device actions | QoS 1 + idempotent handlers |
{root}/{site}/{device}/desired |
Resilient control even when offline | Retained desired state |
{root}/{site}/{device}/reported |
Device confirms actual state | Retained confirmation |
- Per-device credentials (no shared secrets)
- ACLs per topic (least privilege)
- LWT + retained presence
- Payload versioning and timestamps
- Monitoring: connect churn, message rates, broker CPU/memory
Wrap-up
MQTT is popular in IoT for a reason: it’s simple, efficient, and built for unreliable networks. The trick is to use the protocol’s “superpowers” on purpose—topics as a namespace, QoS as a tradeoff, retained messages for state, and LWT + sessions for real-world connectivity.
If you take one practical pattern from this post, take this: model device control as desired state + reported state, and use retained messages so devices recover from offline periods automatically. That single decision makes fleets far easier to operate.
- Write down your topic tree and mark which topics are state vs events
- Add LWT presence to every device client
- Pick QoS intentionally: telemetry (0), commands (1), avoid (2)
- Lock down security: auth + ACL before anything goes public
Want to go deeper? The related posts below pair well with MQTT when you’re building real devices and deploying safely.
Quiz
Quick self-check (demo). This quiz is auto-generated for hardware / iot / embedded.