When an Android app feels “slow,” it’s usually one of four things: main-thread work (CPU), too much allocating (memory/GC), waiting on I/O (network/disk), or missed frames (jank). Android Studio’s Profilers let you see which one is happening—fast—so you stop guessing and start fixing. This guide shows a repeatable workflow to identify the top bottleneck, verify it with a capture, apply a targeted fix, and confirm the improvement.
Quickstart
If you’re on a deadline, use this “90% workflow.” It’s designed to get you from symptom → capture → fix without drowning in graphs.
The fast path (15–30 minutes)
- Reproduce on a real device (same device class users complain about if possible).
- Switch to a release-like build (minify optional, but avoid heavy debug-only overhead).
- Pick the symptom: jank/scrolling, slow screen, slow startup, memory growth, slow API.
- Capture the right thing (table below): record only 10–30 seconds around the issue.
- Find the top culprit: one hot method, one allocation spike, one slow request, or one long frame.
- Apply one targeted fix, then repeat the same capture to confirm improvement.
| What you feel | Start here | What to look for | Typical fixes |
|---|---|---|---|
| Scrolling stutters / animations hitch | Jank / System trace | Frames > 16ms (60Hz) or > 8ms (120Hz), main-thread blocks | Move work off main thread, reduce layout/recomposition, avoid per-frame allocations |
| Screen opens slowly | CPU Profiler | Hot call stacks during navigation / render | Caching, reduce work in onCreate/onResume, batch DB work, lazy load |
| App “freezes” randomly | CPU + Memory | GC storms, huge allocations, blocking I/O on main | Fix allocations, reuse buffers, remove bitmap churn, move disk/network to IO threads |
| Data loads feel slow | Network Profiler | Slow endpoints, large payloads, serialization hotspots | Pagination, caching, compression, parallelization, smaller DTOs |
| Memory grows over time | Memory Profiler | Heap growth after screen closes, retained objects after GC | Fix leaks, clear adapters/listeners, avoid static refs, manage caches |
Don’t profile “everything.” Profile one user journey at a time (e.g., “open product screen and scroll”). Short, repeatable captures beat long recordings you never fully interpret.
Overview
Android performance problems are rarely mysterious—they’re just hard to see without the right lens. Profilers give you that lens by answering four questions:
1) Where did the time go?
CPU profiling shows which methods dominate your slow path (and on which thread). This is how you find the real hotspot instead of optimizing random code.
2) What caused the jank?
System tracing and frame timelines show missed frames and blocking calls. It helps you distinguish “GPU/render” issues from “main thread is doing too much.”
3) Are we allocating too much?
Memory profiling reveals allocation spikes, GC pressure, and retained objects after screens close. This is often the root of “it freezes sometimes” reports.
4) Are we waiting on I/O?
Network profiling and inspector tools show slow requests, large payloads, serialization overhead, and “waterfalls” caused by sequential calls.
What you’ll be able to do after this
- Choose the correct profiler based on symptoms (and avoid the wrong rabbit hole).
- Capture actionable traces in minutes, not hours.
- Read the 3–5 signals that matter (hot paths, long frames, allocation spikes, waterfalls).
- Apply targeted fixes that consistently reduce slowdowns.
- Validate improvements and prevent regressions with repeatable benchmarks.
Your goal is not perfection—it’s predictability. Smooth scrolling, consistent screen transitions, and stable memory matter more than shaving microseconds off a single method.
Core concepts
Before you hit “Record,” align on a few mental models. These keep profiling sessions short and conclusions solid.
Wall time vs CPU time
Wall time is what the user feels. CPU time is how much work the CPU actually did. A slow screen can be “CPU-heavy” (expensive code) or “waiting-heavy” (I/O, locks, contention).
- If CPU time is high: optimize code, reduce work, move off main thread.
- If wall time is high but CPU is low: look for waiting (network, DB, disk, locks).
Threads that matter
Most “feels slow” problems involve one of these threads:
- Main thread: UI events, layout, composition, input handling.
- Render thread: drawing and render pipeline scheduling (varies by UI stack).
- Background executors: DB, JSON parsing, image decoding, work queues.
Jank: what a “missed frame” actually means
At 60Hz you have ~16.6ms to produce each frame; at 120Hz ~8.3ms. A frame misses the deadline when something on the critical path takes too long (often on the main thread), or when rendering work backs up.
| Common cause | What you’ll see | Typical fix |
|---|---|---|
| Main thread doing work per frame | Long slices during scroll/animation | Defer work, precompute, memoize, move to background |
| Excess layout / measurement | Repeated layout passes, heavy measure | Simplify hierarchies, avoid nested weights, reduce recomposition |
| Allocation + GC churn | Allocation spikes before jank; GC events | Reuse objects, avoid per-bind allocations, optimize image pipelines |
| Too much work on the render pipeline | Render-related long work in traces | Reduce overdraw, simplify effects, cache bitmaps, optimize lists |
Sampling vs instrumented traces
CPU profilers typically support a sampling mode (low overhead, good for “what’s hot?”) and an instrumented mode (more detail, more overhead).
- Use sampling to find hotspots quickly.
- Use instrumented traces when you need exact call timing and you can keep the capture short.
The “one-bottleneck” mindset
Your first job is to identify the largest contributor to the slowdown. Fixing the top bottleneck often improves the rest automatically.
- Pick a single scenario and repeat it.
- Capture around the symptom (not minutes before/after).
- Fix one thing, then re-capture to confirm.
Instrumented tracing can slow the app and change timing. That’s normal. Keep captures short, prefer sampling for discovery, and validate improvements with repeatable runs (same device, same steps).
Step-by-step
This is a practical playbook you can reuse for almost any slowdown. Follow the steps in order; each one narrows the search space.
Step 0 — Set up a clean profiling environment
Do this first
- Use a physical device when possible (thermal and GPU behavior differs from emulators).
- Close background apps, disable battery savers, keep the device plugged in.
- Prefer a release-like configuration for realistic results.
- Keep the scenario consistent (same account, same dataset size, same navigation path).
A quick sanity check
Before capturing anything, confirm the slowdown is reproducible at least 3 times. If it isn’t reproducible, focus on logging and measurement first (otherwise traces become guesswork).
Step 1 — Pick the symptom and the “first profiler”
Don’t start with CPU by habit. Start with the profiler that matches what the user feels:
| Symptom | Best first capture | Why |
|---|---|---|
| Scrolling jank / animation hitch | System trace / frame timeline | Shows missed frames and what blocked the UI pipeline |
| Slow screen transition | CPU profiling (sampling) | Finds hot paths quickly without heavy overhead |
| Random freezes / “stops responding” moments | Memory + CPU | Often GC storms or main-thread I/O |
| Slow loading data | Network profiler | Reveals waterfalls, large payloads, serialization bottlenecks |
| Memory creep after navigating around | Memory profiler (heap + GC) | Confirms whether objects are retained after screens close |
Step 2 — Fix jank first (because users notice it instantly)
If the complaint is “it feels laggy,” start with a jank capture. Your goal is to find which thread is missing deadlines and what it was doing during the long frame.
What to inspect in the capture
- Long frames clustered during scroll/animation (not random spikes).
- Main-thread blocks: long tasks, locks, or heavy callbacks.
- Repeated layout/measure/composition work during scroll.
- GC events near the jank window (allocation churn).
Common high-impact jank fixes
- Move expensive work off main thread (DB, JSON parsing, image decoding).
- Reduce per-item binding work in lists (RecyclerView/Compose Lazy lists).
- Cache and memoize derived UI values; avoid allocating during scroll.
- Defer non-critical work until after first draw (post-frame).
Add small trace sections around suspected work so your capture labels become readable. A well-placed trace marker can turn “a mess of stacks” into a clear story.
Example: add trace markers around expensive work so it shows up in system/CPU traces.
import android.os.Trace
import kotlin.system.measureTimeMillis
fun loadAndBindProduct(productId: String) {
Trace.beginSection("ProductScreen#loadAndBind")
try {
val ms = measureTimeMillis {
// Keep blocking work off the main thread in real code.
val product = repository.loadProduct(productId)
val recommendations = repository.loadRecommendations(productId)
// Bind to UI (or update state) after data is ready.
ui.render(product, recommendations)
}
Trace.beginSection("ProductScreen#loadAndBind ms=$ms")
Trace.endSection()
} finally {
Trace.endSection()
}
}
Trace sections are most useful as breadcrumbs around suspected hotspots: screen entry, list binding, image decode, DB query, serialization. Add a few, not dozens.
Step 3 — Use the CPU Profiler to find the true hotspot
CPU profiling answers: “Which call stacks dominate time during the slow moment?” Start with sampling to get a clean picture, then zoom into the hottest path.
A practical CPU profiling loop
- Start recording (sampling).
- Perform the slow action (navigate, open screen, scroll) once or twice.
- Stop recording quickly (short captures are easier to reason about).
- Sort by hot methods/call stacks and focus on the top contributor.
- Answer: “Why is this code on this thread?” and “How often is it called?”
Hotspots you’ll see a lot
- JSON parsing/serialization on main thread
- Image decode/resizing at bind time
- Database queries during navigation
- Repeated formatting (dates, prices, spans)
- Excess recomposition or repeated adapters binds
Fix patterns (low drama)
- Precompute and cache derived values.
- Batch DB queries and avoid N+1 patterns.
- Move parsing/decode to background threads and deliver results to UI.
- Lazy load non-critical data after first render.
Step 4 — Memory Profiler: allocations, GC pressure, and leaks
Memory issues often present as slowdowns first (GC and allocation churn), not crashes. The memory profiler helps you separate three categories:
| Problem type | What you see | How to confirm | Typical fix |
|---|---|---|---|
| Allocation churn | Spikes while scrolling/binding | Allocation tracking + correlate to jank | Reuse objects, avoid creating lists/strings per bind, optimize image pipelines |
| GC pressure | Frequent GC events, pauses | See GC markers during slow moments | Reduce allocations, cache results, avoid large temporary buffers |
| Leaks / retention | Heap grows after leaving screens | Heap dump; find retained references | Clear listeners/adapters, avoid static refs, respect lifecycle, fix caches |
Open Screen A → go back → repeat 5–10 times. If memory keeps rising and never drops after GC, inspect retained objects. If it rises and drops, it may be a cache (which might be fine).
Step 5 — Network Profiler: stop waiting on waterfalls
Many “slow screen” issues are really “slow network + sequential requests.” The network profiler helps you spot large payloads, slow endpoints, and request chains that should be parallel.
High-signal network smells
- Request A finishes, then request B starts (unnecessary sequencing).
- Large responses that you parse on the main thread.
- No caching headers (same data downloaded repeatedly).
- Slow “time to first byte” (server or connectivity).
Fixes that move the needle
- Batch endpoints or add a “summary” endpoint for initial screen.
- Paginate lists; don’t download entire catalogs upfront.
- Cache responses and images; avoid refetch on rotation.
- Parse/transform responses off the main thread.
Optional: capture frame stats or a system trace from the command line to compare runs and share artifacts.
# Replace with your app id
PKG="com.example.app"
# 1) Capture frame stats (jank-friendly) for a short session
adb shell dumpsys gfxinfo "$PKG" reset
# Reproduce: open the slow screen + scroll for ~10 seconds
adb shell dumpsys gfxinfo "$PKG" framestats > framestats.txt
# 2) Capture a system trace (Perfetto) for a targeted window (device-dependent config)
adb shell perfetto -o /data/misc/perfetto-traces/unilab_trace.perfetto-trace -t 10s sched freq idle am wm gfx view
adb pull /data/misc/perfetto-traces/unilab_trace.perfetto-trace .
Step 6 — Verify improvement and prevent regressions
Performance fixes only count if they stick. After you apply a fix, re-run the same scenario and confirm that the key metric moved: fewer long frames, lower time in the hotspot, fewer allocations, or a shorter network chain.
A minimal “done” checklist
- Same scenario run 3 times; results are consistent.
- The top bottleneck is reduced (not just shifted somewhere else).
- No major side effects (new jank, new allocations, broken caching).
- A small benchmark exists for the journey (so the regression shows up early).
Example: a tiny Macrobenchmark module configuration to lock in startup and scroll performance.
plugins {
id("com.android.test")
id("org.jetbrains.kotlin.android")
}
android {
namespace = "com.example.benchmark"
compileSdk = 35
defaultConfig {
minSdk = 28
targetSdk = 35
testInstrumentationRunner = "androidx.test.runner.AndroidJUnitRunner"
}
targetProjectPath = ":app"
testOptions {
managedDevices {
devices {
maybeCreate<com.android.build.api.dsl.ManagedVirtualDevice>("pixel6Api34").apply {
device = "Pixel 6"
apiLevel = 34
systemImageSource = "aosp"
}
}
}
}
}
dependencies {
implementation("androidx.benchmark:benchmark-macro-junit4:1.2.4")
implementation("androidx.test:runner:1.5.2")
implementation("androidx.test.uiautomator:uiautomator:2.3.0")
}
For most apps, users care about startup, navigation, and scroll smoothness. Benchmarks for these catch regressions that pure unit tests will miss.
Common mistakes
These mistakes waste the most time when using Android Studio Profilers. Fixing them makes your profiling sessions faster, more accurate, and easier to explain to teammates.
Mistake 1 — Profiling a debug build and “fixing ghosts”
Debug builds often add overhead (extra checks, logs, slower code paths). Your traces can exaggerate issues that don’t exist in release.
- Fix: profile a release-like build for final decisions; use debug only for quick discovery.
- Fix: keep captures short; prefer sampling when exploring.
Mistake 2 — Recording too long and losing the signal
A 5-minute trace feels thorough, but it becomes impossible to interpret. Most performance wins come from 10–30 seconds around the exact issue.
- Fix: reproduce once, record once, stop immediately.
- Fix: add trace markers around your scenario boundaries.
Mistake 3 — Optimizing before you know the bottleneck
“We should optimize lists” is not a diagnosis. Profiling is about finding the top contributor on the slow path.
- Fix: use the CPU profiler to identify the hottest call stack.
- Fix: verify with before/after captures and the same scenario.
Mistake 4 — Ignoring the main thread
Even if you have background work, the UI still needs the main thread to respond and render. A small block on main can create visible jank.
- Fix: in traces, always check what the main thread was doing during long frames.
- Fix: move heavy work off main; avoid waiting on locks from main.
Mistake 5 — Treating caches as leaks (and leaks as caches)
Memory growth can be normal (image caches) or harmful (retained Activities). The difference is whether memory drops after GC and whether old screens are still referenced.
- Fix: run a “navigate in/out” loop and observe post-GC behavior.
- Fix: use heap dumps to confirm retained references before refactoring.
Mistake 6 — Fixing one device and calling it done
High-end devices can hide jank that appears on mid-range hardware. Your target should reflect your user base.
- Fix: test on at least one mid-range device class.
- Fix: measure worst-slice performance (low light, slow network, cold start).
Many slowdowns disappear when you remove unnecessary work (extra formatting, duplicate DB queries, redundant network calls) rather than “making the same work faster.” Profilers help you see what can be deleted.
FAQ
Should I profile on debug or release?
Use debug for quick exploration (especially when you need to iterate fast), but make final decisions on a release-like build. Debug overhead can distort timing and make some problems look worse (or different) than they are. The key is consistency: same device, same build type, same scenario for before/after comparisons.
What’s the difference between “jank” and “slow CPU”?
Jank is missed frame deadlines (stutter), often caused by main-thread blocks during rendering or input. Slow CPU is general slowness in a flow (screen opens slowly) caused by expensive computation, parsing, or work done too early. Jank feels like hitching; slow CPU feels like waiting.
Sampling or instrumented CPU traces—when do I use each?
Start with sampling to find hotspots with low overhead. Switch to instrumented traces when you need precise method timing and you can keep the capture short. If the trace itself changes behavior (it can), trust the pattern more than the exact numbers.
Why does the app stutter even when CPU usage looks low?
Because the UI is sensitive to short blocks on the main thread, not average CPU usage. A few 20–40ms stalls can cause visible jank even if the device is mostly idle. Use a system/jank trace to inspect the main thread during the missed frames.
How do I know if memory growth is a leak?
Run a repeatable loop (open/close a screen multiple times) and observe memory after GC. If memory rises and never drops and the same screen’s objects remain referenced, suspect a leak. If memory rises and stabilizes (or drops), it may be a cache. Heap dumps help confirm retained references before you refactor.
What’s the easiest way to speed up a slow screen load?
Look for one of these: serial network calls, DB work during navigation, parsing on main thread, or expensive image decode/bind work. CPU profiling plus a network capture usually reveals which one dominates. Then apply a targeted fix: parallelize/batch calls, move work off main, cache derived values, and defer non-critical work until after first render.
How do I keep performance from regressing later?
Add a tiny benchmark for the journey (startup, navigation, scroll) and re-run it for changes that touch UI, networking, or data layers. Profilers help you fix issues today; benchmarks help you avoid reintroducing them next week.
Cheatsheet
Keep this as a quick “what to do next” reference when you’re in the middle of debugging.
Pick the right profiler
- Jank / stutter: System trace + frame timeline
- Slow screen: CPU profiler (sampling)
- Freezes: CPU + Memory (check GC + main thread)
- Slow data: Network profiler (waterfalls, payload size)
- Memory creep: Memory profiler + heap dump
High-signal things to look for
- One hot call stack dominating time during the slow moment
- Long frames clustered during scroll/animation
- Allocation spikes around jank windows
- GC events near freezes or stutters
- Network waterfalls (A then B then C)
Fix patterns that usually work
- Move heavy work off main (DB, parsing, decoding)
- Cache derived UI values (formatting, spans, computed strings)
- Reduce list binding/recomposition work
- Batch/parallelize network calls; paginate large lists
- Reduce allocations; reuse buffers; avoid per-frame object creation
Before/after validation
- Same device + same build type
- Same scenario steps (write them down)
- 3 runs each (ignore the first if it warms caches)
- Confirm the top bottleneck moved
- Watch for “shifted problems” (e.g., less CPU but more memory)
- Can I reproduce it quickly and reliably?
- Do I know what “good” looks like (target time, smoothness)?
- Am I capturing only the window around the issue?
- Do I have one question I want the capture to answer?
Wrap-up
Android Studio Profilers are most powerful when you use them as a workflow, not as a dashboard: reproduce a single scenario, capture the right signal, find one bottleneck, fix it, and confirm the win. Do that consistently and you’ll fix the “90%” slowdowns without heroic refactors.
Your next actions
- Pick one slow journey and write down the exact reproduction steps.
- Run the correct first capture (jank trace / CPU sampling / memory / network).
- Apply a targeted fix (move work off main, reduce allocations, fix waterfalls).
- Re-capture and confirm the metric moved in the right direction.
- Add a small benchmark for the journey so it doesn’t regress.
Bookmark the Cheatsheet. It’s designed for the “I’m in the middle of debugging” moment.
Quiz
Quick self-check (demo). This quiz is auto-generated for mobile / development / android.