Skip to main content
Observability in Depth

RUM and the Front-end Gap

Ravinder··6 min read
ObservabilityTelemetryRUMWeb VitalsFrontend
Share:
RUM and the Front-end Gap

You can have perfect backend observability and still have no idea what users are experiencing. A 50ms API response means nothing if the browser spends 4 seconds parsing a JavaScript bundle, re-layouting the DOM three times, and painting late because a third-party ad script is blocking the main thread. Backend traces end at the network edge. Real User Monitoring (RUM) begins there.

The front-end gap is the time between your API responding and the user seeing a usable page — and for many applications it is the largest contributor to perceived latency.

The Observability Stack: Where RUM Lives

flowchart TB User["Browser / User Device"] --> RUM["RUM Agent\n(Web Vitals, errors,\nsession replay)"] User --> API["HTTP Request"] API --> BE["Backend Services\n(traces, metrics, logs)"] RUM --> RS["RUM Backend\n(Grafana Faro / Sentry)"] BE --> OB["Observability Stack\n(Tempo / Prometheus / Loki)"] RS --> OB OB --> Q["Unified Query\n(Grafana)"] style RUM fill:#e67e22,color:#fff style RS fill:#e67e22,color:#fff

The connection between RUM and backend traces is a trace_id that flows from the browser's XHR/fetch call through the traceparent header into your backend spans — and then back through a response header so the browser agent can correlate the RUM session with the server-side trace.

Core Web Vitals: The Metrics That Matter

Google's Core Web Vitals are the industry standard for user-perceived performance:

Metric Full name Good Needs work Poor Measures
LCP Largest Contentful Paint < 2.5s 2.5–4s > 4s Load speed (largest element)
INP Interaction to Next Paint < 200ms 200–500ms > 500ms Responsiveness
CLS Cumulative Layout Shift < 0.1 0.1–0.25 > 0.25 Visual stability
FCP First Contentful Paint < 1.8s 1.8–3s > 3s Perceived load start
TTFB Time to First Byte < 800ms 800ms–1.8s > 1.8s Server + network response

TTFB is the one metric bridging RUM and backend. A high TTFB with normal backend P99 points to network or CDN issues. A high TTFB matching high backend P99 is a backend problem.

Grafana Faro: Open-Source RUM

Grafana Faro is the pragmatic choice if you are already on the Grafana stack. It collects Web Vitals, JS errors, and custom events, sending them to a Faro collector that forwards to Loki and Tempo.

// Initialize Faro in your React app
import { initializeFaro, getWebInstrumentations } from '@grafana/faro-web-sdk';
import { TracingInstrumentation } from '@grafana/faro-web-tracing';
 
const faro = initializeFaro({
  url: 'https://faro-collector.example.com/collect',
  app: {
    name: 'shop-frontend',
    version: '2.4.1',
    environment: 'production',
  },
  instrumentations: [
    ...getWebInstrumentations({
      captureConsole: true,
      captureConsoleDisabledLevels: ['debug', 'log'],
    }),
    new TracingInstrumentation({
      instrumentationOptions: {
        propagateTraceHeaderCorsUrls: [/api\.example\.com/],
      },
    }),
  ],
});
 
// Custom event: user completes checkout
faro.api.pushEvent('checkout_completed', {
  order_id: orderId,
  amount_cents: String(amountCents),
  payment_method: paymentMethod,
});

The TracingInstrumentation automatically adds traceparent headers to fetch/XHR calls matching propagateTraceHeaderCorsUrls, connecting browser activity to backend traces.

Web Vitals Collection with the web-vitals Library

If you prefer a lighter-weight approach without a full RUM SDK:

import { onCLS, onINP, onLCP, onFCP, onTTFB } from 'web-vitals';
 
function sendToAnalytics({ name, value, rating, id, navigationType }) {
  const body = JSON.stringify({
    metric: name,
    value: Math.round(name === 'CLS' ? value * 1000 : value),
    rating,          // 'good' | 'needs-improvement' | 'poor'
    id,
    navigation_type: navigationType,
    page: window.location.pathname,
    timestamp: Date.now(),
    session_id: getSessionID(),
  });
 
  // Use sendBeacon for reliability at page unload
  navigator.sendBeacon('/metrics/web-vitals', body);
}
 
onCLS(sendToAnalytics);
onINP(sendToAnalytics);
onLCP(sendToAnalytics);
onFCP(sendToAnalytics);
onTTFB(sendToAnalytics);

On the backend, ingest these into Prometheus via a small proxy:

// Express-style handler that converts web-vitals JSON to Prometheus metrics
var webVitalsHistogram = promauto.NewHistogramVec(prometheus.HistogramOpts{
    Name:    "web_vitals_seconds",
    Help:    "Core Web Vitals measurements",
    Buckets: []float64{0.1, 0.25, 0.5, 1, 2.5, 4, 7.5, 15},
}, []string{"metric", "rating", "page"})
 
func handleWebVitals(w http.ResponseWriter, r *http.Request) {
    var payload WebVitalsPayload
    json.NewDecoder(r.Body).Decode(&payload)
 
    val := float64(payload.Value) / 1000 // ms to seconds
    webVitalsHistogram.WithLabelValues(
        payload.Metric, payload.Rating, sanitizePath(payload.Page),
    ).Observe(val)
}

Browser Error Tracking

JavaScript errors that don't surface in backend logs are the most under-monitored failure category. Wire up a global error handler:

// Capture unhandled errors and promise rejections
window.addEventListener('error', (event) => {
  faro.api.pushError(event.error, {
    type: 'unhandled_error',
    context: {
      message: event.message,
      filename: event.filename,
      lineno: String(event.lineno),
    },
  });
});
 
window.addEventListener('unhandledrejection', (event) => {
  faro.api.pushError(
    event.reason instanceof Error
      ? event.reason
      : new Error(String(event.reason)),
    { type: 'unhandled_promise_rejection' }
  );
});

Track error rate as a signal in your SLO:

# Browser error rate (from Faro → Loki)
sum(rate({app="shop-frontend"} | json | kind="exception" [5m]))
/
sum(rate({app="shop-frontend"} | json | kind="navigate" [5m]))

Sampling RUM Data Responsibly

At 10M page views/day, capturing every interaction is prohibitively expensive. Use session sampling:

const SAMPLE_RATE = 0.10; // 10% of sessions
 
const faro = initializeFaro({
  // ...
  sessionTracking: {
    samplingRate: SAMPLE_RATE,
    // Always capture sessions with errors regardless of sample rate
    persistSessionSampling: true,
  },
  beforeSend: (item) => {
    // Always send errors, even in unsampled sessions
    if (item.type === 'exception') return item;
    // Drop other items from unsampled sessions
    return faro.api.getSession()?.attributes?.sampled === 'true' ? item : null;
  },
});

Connecting RUM to Backend Traces in Grafana

Configure Grafana to correlate Faro sessions with Tempo traces:

{
  "correlations": [
    {
      "sourceUID": "faro-datasource",
      "targetUID": "tempo-prod",
      "label": "Open trace",
      "config": {
        "type": "query",
        "field": "traceId",
        "target": {
          "query": "${__value.raw}"
        }
      }
    }
  ]
}

Now in the Faro explore panel you can click a slow page load, see the traceId from the API call, and jump directly to the Tempo waterfall showing which backend service was responsible.

Key Takeaways

  • Backend traces end at the network edge — RUM fills the observability gap between API response and the user seeing a usable page.
  • Core Web Vitals (LCP, INP, CLS) are the standard vocabulary for front-end performance; TTFB is the bridge metric between RUM and backend observability.
  • Grafana Faro's TracingInstrumentation automatically injects traceparent headers into fetch/XHR calls, linking browser sessions to backend traces without manual correlation.
  • Browser JavaScript errors are the most under-monitored failure category in most stacks — unhandled errors and promise rejections must be captured explicitly.
  • Session sampling at 10% is a reasonable default for high-traffic applications; always override sampling for sessions that contain errors.
  • RUM-to-trace correlation in Grafana turns a slow page load report into a one-click path to the root-cause service, making the full observability stack genuinely end-to-end.
Share: