Rust for Backend Engineers

Build Times: The Truth

Ravinder·January 12, 2026·6 min read

RustBackendBuild TimessccacheCargoDeveloper Experience

Series

Rust for Backend Engineers

Part 7 of 9

← Part 6

Database Access

Part 8 →

FFI and Embedding

Slow builds are the most common complaint about Rust in production teams. They are also frequently misdiagnosed. Engineers reach for obscure flags and compiler tricks when the real bottleneck is a misconfigured workspace, an unnecessary dependency, or a missing cache layer. This post starts with what is actually happening and works toward practical fixes.

Why Rust Builds Are Slow

Rust's compiler does more than most:

Monomorphization: Generic code is compiled separately for each concrete type. A single function like process<T> generates multiple machine-code versions.
Borrow checking: The full ownership analysis runs at compile time across every function.
LLVM codegen: Rust produces highly optimized code by default — optimization passes that other languages reserve for release builds.
Procedural macros: #[derive(Serialize, Deserialize)], #[tokio::main], sqlx's query! — these run arbitrary Rust code at compile time.

The result: a medium service with 30 crates and 100 transitive dependencies easily takes 4–6 minutes for a clean build.

graph LR A[Source files] --> B[Rustc frontend: parsing + type checking + borrow check] B --> C[Monomorphization] C --> D[LLVM IR generation] D --> E[LLVM optimization passes] E --> F[Codegen: machine code] F --> G[Linker] G --> H[Binary] style B fill:#f9a,stroke:#c00 style D fill:#f9a,stroke:#c00 style E fill:#f9a,stroke:#c00

The highlighted stages are the expensive ones. LLVM optimization is often 50–70% of total build time on a clean release build.

What You Can Control: The Low-Hanging Fruit

Measure First

Before optimizing, measure. cargo build --timings produces an HTML report showing exactly which crates took longest and which ones blocked others.

cargo build --timings
# Opens target/cargo-timings/cargo-timing.html

Sort by "duration" in that report. Nine times out of ten, a handful of crates account for most of the time.

Split Debug and Release Profiles

Do not build with --release during development. Debug builds skip LLVM optimization and are 3–5x faster.

# Cargo.toml
[profile.dev]
opt-level = 0      # no optimization
debug = true       # full debug info
 
[profile.release]
opt-level = 3      # full optimization
debug = false
lto = "thin"       # link-time optimization (thin is fast; full is slow)
codegen-units = 1  # single unit: slowest build, smallest + fastest binary

For CI where you need some optimization but not full release performance, add a custom profile:

[profile.ci]
inherits = "release"
opt-level = 2
lto = false
codegen-units = 16  # parallel codegen units — faster build, slightly larger binary

Use a Faster Linker

The default linker (ld on Linux, ld64 on macOS) is slow. Switching to mold (Linux) or lld can halve link times.

# .cargo/config.toml
[target.x86_64-unknown-linux-gnu]
linker = "clang"
rustflags = ["-C", "link-arg=-fuse-ld=mold"]
 
[target.aarch64-apple-darwin]
rustflags = ["-C", "link-arg=-fuse-ld=/opt/homebrew/bin/mold"]

On macOS with Apple Silicon, zld is another option, though mold 2.x now supports macOS.

sccache: Caching Across Machines

sccache is a compiler cache that sits in front of rustc. It caches compilation outputs keyed on the input files, flags, and compiler version. On a warm cache, dependency compilation drops from minutes to seconds.

# Install
cargo install sccache
 
# Configure in .cargo/config.toml
[build]
rustc-wrapper = "sccache"

For CI, point sccache at S3 or GCS for shared cache across all pipeline jobs:

export SCCACHE_BUCKET=my-sccache-bucket
export SCCACHE_REGION=us-east-1
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...

A well-configured sccache setup typically reduces CI build times by 60–80% for dependency compilation. Your own code still recompiles — sccache only caches crates, not your workspace members.

Workspace Structure

A monorepo with all code in one crate compiles everything whenever anything changes. A workspace with multiple crates enables parallel and incremental compilation at crate granularity.

graph TD A[api-server] --> B[domain] A --> C[auth] B --> D[db] C --> D D --> E[common] style A fill:#ddf,stroke:#00c

In this structure, changing a handler in api-server only recompiles api-server. Changing common invalidates everything downstream, so keep it stable and small.

workspace/
├── Cargo.toml           # [workspace] members = [...]
├── api-server/
│   ├── Cargo.toml
│   └── src/
├── domain/
│   ├── Cargo.toml
│   └── src/
├── auth/
│   ├── Cargo.toml
│   └── src/
└── common/
    ├── Cargo.toml
    └── src/

# Root Cargo.toml
[workspace]
members = ["api-server", "domain", "auth", "common"]
resolver = "2"
 
# Share dependency versions across the workspace
[workspace.dependencies]
tokio = { version = "1", features = ["full"] }
serde = { version = "1", features = ["derive"] }
sqlx = { version = "0.7", features = ["postgres", "runtime-tokio"] }

Reducing Dependency Compilation

Transitive dependencies are often the biggest compile-time sink. Audit them:

# See all dependencies and their sizes
cargo tree
 
# Find duplicate versions (often inflating compile time)
cargo tree --duplicates
 
# Check which features you're pulling in
cargo tree --features

Disable default features you do not use:

[dependencies]
# Instead of pulling all hyper features:
hyper = { version = "1", default-features = false, features = ["server", "http1"] }
 
# Instead of all tokio features:
tokio = { version = "1", features = ["rt-multi-thread", "net", "time", "sync"] }

Cranelift Backend for Development

cranelift is an alternative codegen backend that trades optimization for compilation speed. For development builds, it can produce debug binaries 2–3x faster than the default LLVM backend.

# Install
rustup component add rustc-codegen-cranelift-preview --toolchain nightly
 
# Use for debug builds
CARGO_PROFILE_DEV_CODEGEN_BACKEND=cranelift cargo +nightly build

Note: cranelift is nightly-only and should only be used for local development, never for production builds.

Putting It Together: A Realistic Setup

# .cargo/config.toml
[build]
rustc-wrapper = "sccache"
 
[target.x86_64-unknown-linux-gnu]
linker = "clang"
rustflags = ["-C", "link-arg=-fuse-ld=mold"]
 
[profile.dev]
opt-level = 0
debug = "line-tables-only"   # faster than full debug symbols
 
[profile.ci]
inherits = "release"
opt-level = 2
codegen-units = 16
lto = false

Expected outcome on a 30-crate workspace:

Clean CI build (cold sccache): ~5 min
CI build with warm sccache: ~90 sec
Local incremental build after a single file change: ~10–20 sec

Key Takeaways

Run cargo build --timings before optimizing — it identifies the actual bottleneck crates, which are rarely what you expect.
Never use --release during local development; debug builds are 3–5x faster due to skipped LLVM optimization passes.
sccache with remote storage (S3 or GCS) reduces CI build times by 60–80% for dependency compilation — it is the single highest-leverage optimization for teams.
Workspace splits at sensible domain boundaries enable crate-level incremental compilation, so a handler change only recompiles one crate.
Switching to mold or lld from the default linker can cut link time in half, which is particularly noticeable for frequent incremental builds.
Disable unused default features in heavy dependencies like tokio, hyper, and serde — feature flags control monomorphization and directly affect compile time.

Series

Rust for Backend Engineers

Part 7 of 9

← Part 6

Database Access

Part 8 →

FFI and Embedding