Observability in Depth

RUM and the Front-end Gap

Ravinder·September 12, 2025·6 min read

ObservabilityTelemetryRUMWeb VitalsFrontend

Series

Observability in Depth

Part 7 of 10

← Part 6

Synthetic Monitoring

Part 8 →

SLOs That Drive Decisions

You can have perfect backend observability and still have no idea what users are experiencing. A 50ms API response means nothing if the browser spends 4 seconds parsing a JavaScript bundle, re-layouting the DOM three times, and painting late because a third-party ad script is blocking the main thread. Backend traces end at the network edge. Real User Monitoring (RUM) begins there.

The front-end gap is the time between your API responding and the user seeing a usable page — and for many applications it is the largest contributor to perceived latency.

The Observability Stack: Where RUM Lives

flowchart TB User["Browser / User Device"] --> RUM["RUM Agent\n(Web Vitals, errors,\nsession replay)"] User --> API["HTTP Request"] API --> BE["Backend Services\n(traces, metrics, logs)"] RUM --> RS["RUM Backend\n(Grafana Faro / Sentry)"] BE --> OB["Observability Stack\n(Tempo / Prometheus / Loki)"] RS --> OB OB --> Q["Unified Query\n(Grafana)"] style RUM fill:#e67e22,color:#fff style RS fill:#e67e22,color:#fff

The connection between RUM and backend traces is a trace_id that flows from the browser's XHR/fetch call through the traceparent header into your backend spans — and then back through a response header so the browser agent can correlate the RUM session with the server-side trace.

Core Web Vitals: The Metrics That Matter

Google's Core Web Vitals are the industry standard for user-perceived performance:

Metric	Full name	Good	Needs work	Poor	Measures
LCP	Largest Contentful Paint	< 2.5s	2.5–4s	> 4s	Load speed (largest element)
INP	Interaction to Next Paint	< 200ms	200–500ms	> 500ms	Responsiveness
CLS	Cumulative Layout Shift	< 0.1	0.1–0.25	> 0.25	Visual stability
FCP	First Contentful Paint	< 1.8s	1.8–3s	> 3s	Perceived load start
TTFB	Time to First Byte	< 800ms	800ms–1.8s	> 1.8s	Server + network response

TTFB is the one metric bridging RUM and backend. A high TTFB with normal backend P99 points to network or CDN issues. A high TTFB matching high backend P99 is a backend problem.

Grafana Faro: Open-Source RUM

Grafana Faro is the pragmatic choice if you are already on the Grafana stack. It collects Web Vitals, JS errors, and custom events, sending them to a Faro collector that forwards to Loki and Tempo.

// Initialize Faro in your React app
import { initializeFaro, getWebInstrumentations } from '@grafana/faro-web-sdk';
import { TracingInstrumentation } from '@grafana/faro-web-tracing';
 
const faro = initializeFaro({
  url: 'https://faro-collector.example.com/collect',
  app: {
    name: 'shop-frontend',
    version: '2.4.1',
    environment: 'production',
  },
  instrumentations: [
    ...getWebInstrumentations({
      captureConsole: true,
      captureConsoleDisabledLevels: ['debug', 'log'],
    }),
    new TracingInstrumentation({
      instrumentationOptions: {
        propagateTraceHeaderCorsUrls: [/api\.example\.com/],
      },
    }),
  ],
});
 
// Custom event: user completes checkout
faro.api.pushEvent('checkout_completed', {
  order_id: orderId,
  amount_cents: String(amountCents),
  payment_method: paymentMethod,
});

The TracingInstrumentation automatically adds traceparent headers to fetch/XHR calls matching propagateTraceHeaderCorsUrls, connecting browser activity to backend traces.

Web Vitals Collection with the web-vitals Library

If you prefer a lighter-weight approach without a full RUM SDK:

import { onCLS, onINP, onLCP, onFCP, onTTFB } from 'web-vitals';
 
function sendToAnalytics({ name, value, rating, id, navigationType }) {
  const body = JSON.stringify({
    metric: name,
    value: Math.round(name === 'CLS' ? value * 1000 : value),
    rating,          // 'good' | 'needs-improvement' | 'poor'
    id,
    navigation_type: navigationType,
    page: window.location.pathname,
    timestamp: Date.now(),
    session_id: getSessionID(),
  });
 
  // Use sendBeacon for reliability at page unload
  navigator.sendBeacon('/metrics/web-vitals', body);
}
 
onCLS(sendToAnalytics);
onINP(sendToAnalytics);
onLCP(sendToAnalytics);
onFCP(sendToAnalytics);
onTTFB(sendToAnalytics);

On the backend, ingest these into Prometheus via a small proxy:

// Express-style handler that converts web-vitals JSON to Prometheus metrics
var webVitalsHistogram = promauto.NewHistogramVec(prometheus.HistogramOpts{
    Name:    "web_vitals_seconds",
    Help:    "Core Web Vitals measurements",
    Buckets: []float64{0.1, 0.25, 0.5, 1, 2.5, 4, 7.5, 15},
}, []string{"metric", "rating", "page"})
 
func handleWebVitals(w http.ResponseWriter, r *http.Request) {
    var payload WebVitalsPayload
    json.NewDecoder(r.Body).Decode(&payload)
 
    val := float64(payload.Value) / 1000 // ms to seconds
    webVitalsHistogram.WithLabelValues(
        payload.Metric, payload.Rating, sanitizePath(payload.Page),
    ).Observe(val)
}

Browser Error Tracking

JavaScript errors that don't surface in backend logs are the most under-monitored failure category. Wire up a global error handler:

// Capture unhandled errors and promise rejections
window.addEventListener('error', (event) => {
  faro.api.pushError(event.error, {
    type: 'unhandled_error',
    context: {
      message: event.message,
      filename: event.filename,
      lineno: String(event.lineno),
    },
  });
});
 
window.addEventListener('unhandledrejection', (event) => {
  faro.api.pushError(
    event.reason instanceof Error
      ? event.reason
      : new Error(String(event.reason)),
    { type: 'unhandled_promise_rejection' }
  );
});

Track error rate as a signal in your SLO:

# Browser error rate (from Faro → Loki)
sum(rate({app="shop-frontend"} | json | kind="exception" [5m]))
/
sum(rate({app="shop-frontend"} | json | kind="navigate" [5m]))

Sampling RUM Data Responsibly

At 10M page views/day, capturing every interaction is prohibitively expensive. Use session sampling:

const SAMPLE_RATE = 0.10; // 10% of sessions
 
const faro = initializeFaro({
  // ...
  sessionTracking: {
    samplingRate: SAMPLE_RATE,
    // Always capture sessions with errors regardless of sample rate
    persistSessionSampling: true,
  },
  beforeSend: (item) => {
    // Always send errors, even in unsampled sessions
    if (item.type === 'exception') return item;
    // Drop other items from unsampled sessions
    return faro.api.getSession()?.attributes?.sampled === 'true' ? item : null;
  },
});

Connecting RUM to Backend Traces in Grafana

Configure Grafana to correlate Faro sessions with Tempo traces:

{
  "correlations": [
    {
      "sourceUID": "faro-datasource",
      "targetUID": "tempo-prod",
      "label": "Open trace",
      "config": {
        "type": "query",
        "field": "traceId",
        "target": {
          "query": "${__value.raw}"
        }
      }
    }
  ]
}

Now in the Faro explore panel you can click a slow page load, see the traceId from the API call, and jump directly to the Tempo waterfall showing which backend service was responsible.

Key Takeaways

Backend traces end at the network edge — RUM fills the observability gap between API response and the user seeing a usable page.
Core Web Vitals (LCP, INP, CLS) are the standard vocabulary for front-end performance; TTFB is the bridge metric between RUM and backend observability.
Grafana Faro's TracingInstrumentation automatically injects traceparent headers into fetch/XHR calls, linking browser sessions to backend traces without manual correlation.
Browser JavaScript errors are the most under-monitored failure category in most stacks — unhandled errors and promise rejections must be captured explicitly.
Session sampling at 10% is a reasonable default for high-traffic applications; always override sampling for sessions that contain errors.
RUM-to-trace correlation in Grafana turns a slow page load report into a one-click path to the root-cause service, making the full observability stack genuinely end-to-end.

Series

Observability in Depth

Part 7 of 10

← Part 6

Synthetic Monitoring

Part 8 →

SLOs That Drive Decisions