Build Times: The Truth
Slow builds are the most common complaint about Rust in production teams. They are also frequently misdiagnosed. Engineers reach for obscure flags and compiler tricks when the real bottleneck is a misconfigured workspace, an unnecessary dependency, or a missing cache layer. This post starts with what is actually happening and works toward practical fixes.
Why Rust Builds Are Slow
Rust's compiler does more than most:
- Monomorphization: Generic code is compiled separately for each concrete type. A single function like
process<T>generates multiple machine-code versions. - Borrow checking: The full ownership analysis runs at compile time across every function.
- LLVM codegen: Rust produces highly optimized code by default — optimization passes that other languages reserve for release builds.
- Procedural macros:
#[derive(Serialize, Deserialize)],#[tokio::main], sqlx'squery!— these run arbitrary Rust code at compile time.
The result: a medium service with 30 crates and 100 transitive dependencies easily takes 4–6 minutes for a clean build.
The highlighted stages are the expensive ones. LLVM optimization is often 50–70% of total build time on a clean release build.
What You Can Control: The Low-Hanging Fruit
Measure First
Before optimizing, measure. cargo build --timings produces an HTML report showing exactly which crates took longest and which ones blocked others.
cargo build --timings
# Opens target/cargo-timings/cargo-timing.htmlSort by "duration" in that report. Nine times out of ten, a handful of crates account for most of the time.
Split Debug and Release Profiles
Do not build with --release during development. Debug builds skip LLVM optimization and are 3–5x faster.
# Cargo.toml
[profile.dev]
opt-level = 0 # no optimization
debug = true # full debug info
[profile.release]
opt-level = 3 # full optimization
debug = false
lto = "thin" # link-time optimization (thin is fast; full is slow)
codegen-units = 1 # single unit: slowest build, smallest + fastest binaryFor CI where you need some optimization but not full release performance, add a custom profile:
[profile.ci]
inherits = "release"
opt-level = 2
lto = false
codegen-units = 16 # parallel codegen units — faster build, slightly larger binaryUse a Faster Linker
The default linker (ld on Linux, ld64 on macOS) is slow. Switching to mold (Linux) or lld can halve link times.
# .cargo/config.toml
[target.x86_64-unknown-linux-gnu]
linker = "clang"
rustflags = ["-C", "link-arg=-fuse-ld=mold"]
[target.aarch64-apple-darwin]
rustflags = ["-C", "link-arg=-fuse-ld=/opt/homebrew/bin/mold"]On macOS with Apple Silicon, zld is another option, though mold 2.x now supports macOS.
sccache: Caching Across Machines
sccache is a compiler cache that sits in front of rustc. It caches compilation outputs keyed on the input files, flags, and compiler version. On a warm cache, dependency compilation drops from minutes to seconds.
# Install
cargo install sccache
# Configure in .cargo/config.toml
[build]
rustc-wrapper = "sccache"For CI, point sccache at S3 or GCS for shared cache across all pipeline jobs:
export SCCACHE_BUCKET=my-sccache-bucket
export SCCACHE_REGION=us-east-1
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...A well-configured sccache setup typically reduces CI build times by 60–80% for dependency compilation. Your own code still recompiles — sccache only caches crates, not your workspace members.
Workspace Structure
A monorepo with all code in one crate compiles everything whenever anything changes. A workspace with multiple crates enables parallel and incremental compilation at crate granularity.
In this structure, changing a handler in api-server only recompiles api-server. Changing common invalidates everything downstream, so keep it stable and small.
workspace/
├── Cargo.toml # [workspace] members = [...]
├── api-server/
│ ├── Cargo.toml
│ └── src/
├── domain/
│ ├── Cargo.toml
│ └── src/
├── auth/
│ ├── Cargo.toml
│ └── src/
└── common/
├── Cargo.toml
└── src/# Root Cargo.toml
[workspace]
members = ["api-server", "domain", "auth", "common"]
resolver = "2"
# Share dependency versions across the workspace
[workspace.dependencies]
tokio = { version = "1", features = ["full"] }
serde = { version = "1", features = ["derive"] }
sqlx = { version = "0.7", features = ["postgres", "runtime-tokio"] }Reducing Dependency Compilation
Transitive dependencies are often the biggest compile-time sink. Audit them:
# See all dependencies and their sizes
cargo tree
# Find duplicate versions (often inflating compile time)
cargo tree --duplicates
# Check which features you're pulling in
cargo tree --featuresDisable default features you do not use:
[dependencies]
# Instead of pulling all hyper features:
hyper = { version = "1", default-features = false, features = ["server", "http1"] }
# Instead of all tokio features:
tokio = { version = "1", features = ["rt-multi-thread", "net", "time", "sync"] }Cranelift Backend for Development
cranelift is an alternative codegen backend that trades optimization for compilation speed. For development builds, it can produce debug binaries 2–3x faster than the default LLVM backend.
# Install
rustup component add rustc-codegen-cranelift-preview --toolchain nightly
# Use for debug builds
CARGO_PROFILE_DEV_CODEGEN_BACKEND=cranelift cargo +nightly buildNote: cranelift is nightly-only and should only be used for local development, never for production builds.
Putting It Together: A Realistic Setup
# .cargo/config.toml
[build]
rustc-wrapper = "sccache"
[target.x86_64-unknown-linux-gnu]
linker = "clang"
rustflags = ["-C", "link-arg=-fuse-ld=mold"]
[profile.dev]
opt-level = 0
debug = "line-tables-only" # faster than full debug symbols
[profile.ci]
inherits = "release"
opt-level = 2
codegen-units = 16
lto = falseExpected outcome on a 30-crate workspace:
- Clean CI build (cold sccache): ~5 min
- CI build with warm sccache: ~90 sec
- Local incremental build after a single file change: ~10–20 sec
Key Takeaways
- Run
cargo build --timingsbefore optimizing — it identifies the actual bottleneck crates, which are rarely what you expect. - Never use
--releaseduring local development; debug builds are 3–5x faster due to skipped LLVM optimization passes. sccachewith remote storage (S3 or GCS) reduces CI build times by 60–80% for dependency compilation — it is the single highest-leverage optimization for teams.- Workspace splits at sensible domain boundaries enable crate-level incremental compilation, so a handler change only recompiles one crate.
- Switching to
moldorlldfrom the default linker can cut link time in half, which is particularly noticeable for frequent incremental builds. - Disable unused default features in heavy dependencies like
tokio,hyper, andserde— feature flags control monomorphization and directly affect compile time.