Skip to main content
Rust for Backend Engineers

Operating Rust Services

Ravinder··6 min read
RustBackendObservabilityDeploymentProductionTracing
Share:
Operating Rust Services

Building a Rust service is the first half of the job. Operating one requires instrumentation that tells you what is happening at runtime, panic handling that fails gracefully, and deployment patterns that take advantage of what Rust binaries actually are. This post covers the operational layer that most tutorials skip.

Observability with the tracing Crate

The tracing crate is the standard observability layer for Rust async services. Unlike log, tracing is structured — spans and events carry key-value fields, not just formatted strings.

[dependencies]
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter", "json"] }

Structured Logging

use tracing::{info, warn, error, instrument};
 
#[instrument(skip(pool), fields(user_id = %id))]
async fn handle_user_request(id: u64, pool: &sqlx::PgPool) -> Result<User, UserError> {
    info!("fetching user");
 
    let user = fetch_user(id, pool).await.map_err(|e| {
        error!(error = %e, "database query failed");
        UserError::Database(e)
    })?;
 
    if user.is_suspended {
        warn!(reason = "account_suspended", "user access denied");
        return Err(UserError::Forbidden {
            user_id: id,
            resource: "profile".to_string(),
        });
    }
 
    info!(email = %user.email, "user fetched successfully");
    Ok(user)
}

#[instrument] automatically creates a span around the function and records its name, arguments, and timing. Every log event inside the function is attached to that span. In a distributed system, spans chain across service boundaries to form traces.

Initializing the Subscriber

use tracing_subscriber::{layer::SubscriberExt, util::SubscriberInitExt, EnvFilter};
 
fn init_tracing() {
    tracing_subscriber::registry()
        .with(EnvFilter::try_from_default_env()
            .unwrap_or_else(|_| "info,sqlx=warn,hyper=warn".into()))
        .with(
            tracing_subscriber::fmt::layer()
                .json()                        // structured JSON output
                .with_current_span(true)       // include span context
                .with_target(true),
        )
        .init();
}

Set RUST_LOG=debug in development, RUST_LOG=info in production, RUST_LOG=warn in performance-sensitive paths. The EnvFilter layer parses this at startup with no runtime overhead per log event.

OpenTelemetry Integration

For distributed tracing across services, connect tracing to the OpenTelemetry pipeline:

[dependencies]
opentelemetry = "0.21"
opentelemetry_sdk = { version = "0.21", features = ["rt-tokio"] }
opentelemetry-otlp = { version = "0.14", features = ["tonic"] }
tracing-opentelemetry = "0.22"
use opentelemetry_otlp::WithExportConfig;
 
async fn init_telemetry() -> anyhow::Result<()> {
    let exporter = opentelemetry_otlp::new_exporter()
        .tonic()
        .with_endpoint("http://otel-collector:4317");
 
    let tracer = opentelemetry_otlp::new_pipeline()
        .tracing()
        .with_exporter(exporter)
        .install_batch(opentelemetry_sdk::runtime::Tokio)?;
 
    tracing_subscriber::registry()
        .with(tracing_opentelemetry::layer().with_tracer(tracer))
        .with(EnvFilter::from_default_env())
        .with(tracing_subscriber::fmt::layer())
        .init();
 
    Ok(())
}

All #[instrument] spans now export to your OTLP-compatible backend (Jaeger, Tempo, Honeycomb, Datadog) with no code changes.

Metrics with Prometheus

[dependencies]
metrics = "0.22"
metrics-exporter-prometheus = "0.13"
use metrics::{counter, gauge, histogram};
 
fn init_metrics() {
    metrics_exporter_prometheus::PrometheusBuilder::new()
        .with_http_listener(([0, 0, 0, 0], 9090))
        .install()
        .expect("failed to install Prometheus exporter");
}
 
async fn handle_request(method: &str, path: &str) {
    let start = std::time::Instant::now();
 
    counter!("http_requests_total", "method" => method.to_string(), "path" => path.to_string())
        .increment(1);
 
    // ... process request ...
 
    histogram!("http_request_duration_seconds",
        "method" => method.to_string(),
        "path" => path.to_string()
    ).record(start.elapsed().as_secs_f64());
}

The metrics facade decouples your instrumentation from the exporter. Switch from Prometheus to statsd or another backend by changing one line in init_metrics.

Panic Handling

In Rust, a panic is not an exception — it is an unrecoverable error that unwinds the stack and terminates the thread. In an async service, an unhandled panic in a spawned task causes the task to stop, not the process.

graph TD A[tokio::spawn task] -->|panic| B{Catch location} B -->|JoinHandle.await| C[Returns Err with panic payload] B -->|not awaited| D[Panic is logged, task silently dropped] C --> E[Handle or propagate in caller] D --> F[Silent failure — dangerous]

Always handle the Err case from JoinHandle:

let handle = tokio::spawn(async move {
    process_batch(batch).await
});
 
match handle.await {
    Ok(Ok(result)) => { /* success */ }
    Ok(Err(e)) => { error!(error = %e, "batch processing failed"); }
    Err(panic) => { error!("task panicked: {:?}", panic); }
}

Custom Panic Hook

Install a custom panic hook to emit structured logs before the process terminates:

fn install_panic_hook() {
    std::panic::set_hook(Box::new(|info| {
        let location = info.location()
            .map(|l| format!("{}:{}", l.file(), l.line()))
            .unwrap_or_else(|| "unknown".to_string());
 
        let message = match info.payload().downcast_ref::<&str>() {
            Some(s) => *s,
            None => "Box<Any>",
        };
 
        tracing::error!(
            panic.message = message,
            panic.location = %location,
            "process panicked"
        );
 
        // Flush spans before exit
        opentelemetry::global::shutdown_tracer_provider();
    }));
}

This ensures the panic is visible in your log aggregation system, not just in stderr.

Health Checks and Graceful Shutdown

use axum::{routing::get, Router};
use std::sync::atomic::{AtomicBool, Ordering};
use std::sync::Arc;
 
static READY: AtomicBool = AtomicBool::new(false);
 
async fn liveness() -> &'static str {
    "alive"
}
 
async fn readiness() -> axum::response::Result<&'static str> {
    if READY.load(Ordering::Relaxed) {
        Ok("ready")
    } else {
        Err(axum::http::StatusCode::SERVICE_UNAVAILABLE.into())
    }
}
 
async fn run_service(pool: sqlx::PgPool) {
    let app = Router::new()
        .route("/health/live", get(liveness))
        .route("/health/ready", get(readiness))
        .with_state(pool);
 
    let listener = tokio::net::TcpListener::bind("0.0.0.0:3000").await.unwrap();
 
    // Signal readiness after startup is complete
    READY.store(true, Ordering::Relaxed);
    info!("service ready");
 
    axum::serve(listener, app)
        .with_graceful_shutdown(shutdown_signal())
        .await
        .unwrap();
}
 
async fn shutdown_signal() {
    use tokio::signal;
    let ctrl_c = async { signal::ctrl_c().await.expect("ctrl-c handler") };
    let terminate = async {
        signal::unix::signal(signal::unix::SignalKind::terminate())
            .expect("SIGTERM handler")
            .recv()
            .await;
    };
    tokio::select! {
        _ = ctrl_c => {},
        _ = terminate => {},
    }
    info!("shutdown signal received — draining connections");
    READY.store(false, Ordering::Relaxed);
}

Deployment Considerations

graph LR A[Rust binary] -->|FROM scratch| B[3-5 MB container image] A -->|FROM alpine| C[10-15 MB container image] A -->|FROM debian-slim| D[80 MB container image] style B fill:#afa,stroke:#060 style C fill:#ffa,stroke:#660 style D fill:#faa,stroke:#600

Rust binaries are statically linked by default on Linux (with musl target). The resulting binary runs in a FROM scratch image with no OS dependencies.

# Multi-stage build
FROM rust:1.75-alpine AS builder
RUN apk add --no-cache musl-dev
WORKDIR /app
COPY . .
RUN cargo build --release --target x86_64-unknown-linux-musl
 
FROM scratch
COPY --from=builder /app/target/x86_64-unknown-linux-musl/release/myservice /myservice
ENTRYPOINT ["/myservice"]

For services that link against system libraries (OpenSSL, libpq), use distroless instead:

FROM gcr.io/distroless/cc-debian12
COPY --from=builder /app/target/release/myservice /myservice
ENTRYPOINT ["/myservice"]

Environment variables for operational configuration:

fn load_config() -> Config {
    Config {
        database_url: std::env::var("DATABASE_URL").expect("DATABASE_URL required"),
        port: std::env::var("PORT")
            .unwrap_or_else(|_| "3000".to_string())
            .parse()
            .expect("PORT must be a number"),
        log_level: std::env::var("RUST_LOG").unwrap_or_else(|_| "info".to_string()),
    }
}

Key Takeaways

  • Use tracing with the #[instrument] macro for structured, span-aware logging that integrates directly with OpenTelemetry without code changes.
  • Always await JoinHandle and handle the panic case — an unhandled panic in a spawned task silently drops the task in async Rust.
  • Install a custom panic hook to emit structured logs and flush telemetry before the process exits; panics are otherwise invisible in your observability stack.
  • Separate liveness (/health/live) from readiness (/health/ready) — liveness tells the orchestrator the process is running; readiness tells it the service is ready to serve traffic.
  • Rust binaries link statically with musl and run in FROM scratch images of 3–5 MB — a significant operational advantage for image pull times and attack surface.
  • Graceful shutdown via with_graceful_shutdown drains in-flight connections before the process exits, preventing dropped requests during rolling deployments.
Share: