Observability and Tracing
Series
MCP Deep DiveWhen an agent calls your MCP tool and something goes wrong, you need to answer: which user session triggered this, which model call originated it, what did the server do, and how long did each step take? Without a trace ID that crosses the agent boundary, you are looking at disconnected log entries in three different systems and guessing at causality. This post covers the instrumentation pattern that makes agent interactions as traceable as any other distributed system.
The Observability Gap at the Agent Boundary
Traditional distributed tracing assumes a service calls another service over HTTP, propagating a trace context header. MCP introduces a new boundary: an AI model reasoning step that produces a tools/call request. The model itself is not instrumentable the way a microservice is. But the boundary between the model and the MCP server is — that is where your trace starts.
The agent generates a traceparent header (W3C Trace Context format) before the MCP call. The server extracts it, starts a child span, and propagates it to any downstream calls.
Instrumentation: Server Side
Use the OpenTelemetry SDK. Add it once at startup and all subsequent spans inherit the context automatically.
// src/telemetry.ts
import { NodeSDK } from "@opentelemetry/sdk-node";
import { OTLPTraceExporter } from "@opentelemetry/exporter-trace-otlp-http";
import { Resource } from "@opentelemetry/resources";
import { ATTR_SERVICE_NAME } from "@opentelemetry/semantic-conventions";
export function initTelemetry() {
const sdk = new NodeSDK({
resource: new Resource({ [ATTR_SERVICE_NAME]: "taskflow-mcp" }),
traceExporter: new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT ?? "http://localhost:4318/v1/traces"
})
});
sdk.start();
process.on("SIGTERM", () => sdk.shutdown());
}// src/server.ts — import telemetry before anything else
import "./telemetry.js"; // must be first
import { trace, context, propagation } from "@opentelemetry/api";
import express from "express";
const tracer = trace.getTracer("taskflow-mcp");
app.post("/mcp", async (req, res) => {
// Extract W3C traceparent from incoming headers
const parentContext = propagation.extract(context.active(), req.headers);
await context.with(parentContext, async () => {
const span = tracer.startSpan("mcp.request");
try {
const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: undefined });
await server.connect(transport);
await transport.handleRequest(req, res, req.body);
} finally {
span.end();
}
});
});Instrumentation: Tool Handlers
Wrap each tool call in its own span so you can see per-tool latency.
// src/handlers/tools.ts
import { trace, SpanStatusCode } from "@opentelemetry/api";
const tracer = trace.getTracer("taskflow-mcp");
server.setRequestHandler(CallToolRequestSchema, async (req) => {
const span = tracer.startSpan(`tool.${req.params.name}`, {
attributes: {
"mcp.tool.name": req.params.name,
"mcp.tool.args_len": JSON.stringify(req.params.arguments ?? {}).length
}
});
try {
const result = await dispatchTool(req.params.name, req.params.arguments);
span.setStatus({ code: SpanStatusCode.OK });
return result;
} catch (err) {
span.recordException(err as Error);
span.setStatus({ code: SpanStatusCode.ERROR, message: (err as Error).message });
throw err;
} finally {
span.end();
}
});Structured Logging with Trace Correlation
Logs are useful but only if you can join them to a trace. Use the trace and span IDs as log fields.
import { trace } from "@opentelemetry/api";
import pino from "pino";
const logger = pino({ level: "info" });
function log(level: "info" | "error" | "warn", msg: string, extra: Record<string, unknown> = {}) {
const span = trace.getActiveSpan();
const traceId = span?.spanContext().traceId;
const spanId = span?.spanContext().spanId;
logger[level]({ traceId, spanId, ...extra }, msg);
}
// Usage inside a tool handler:
log("info", "Creating task", { project_id, title });This gives you a traceId on every log line that matches the span in your tracing backend — Jaeger, Tempo, or any OTLP-compatible collector.
Metrics: What to Measure
import { metrics } from "@opentelemetry/api";
const meter = metrics.getMeter("taskflow-mcp");
const toolCallsCounter = meter.createCounter("mcp.tool.calls.total");
const toolDuration = meter.createHistogram("mcp.tool.duration_ms", { unit: "ms" });
const toolErrors = meter.createCounter("mcp.tool.errors.total");
// In your tool dispatch wrapper:
const start = Date.now();
toolCallsCounter.add(1, { tool: req.params.name });
try {
const result = await dispatchTool(req.params.name, req.params.arguments);
toolDuration.record(Date.now() - start, { tool: req.params.name });
return result;
} catch (err) {
toolErrors.add(1, { tool: req.params.name, error: (err as Error).constructor.name });
throw err;
}Session and User Attribution
For multi-tenant servers you need to know whose agent made the call. Pull the user sub from the validated JWT and attach it to the span.
app.post("/mcp", async (req, res) => {
const claims = await validateToken(req.headers.authorization);
const parentContext = propagation.extract(context.active(), req.headers);
await context.with(parentContext, async () => {
const span = tracer.startSpan("mcp.request");
span.setAttributes({
"user.id": claims.sub ?? "unknown",
"client.id": claims.azp ?? "unknown", // authorized party
});
// ...
span.end();
});
});Now every span in your tracing backend is annotated with the user and client identities, making incident investigation a search query rather than a log archaeology project.
Key Takeaways
- The agent boundary is a distributed system boundary — treat it with the same tracing discipline as any microservice call.
- W3C
traceparentheaders are the standard mechanism for propagating trace context into MCP servers. - OpenTelemetry's context propagation API handles context extraction and injection without manual plumbing.
- Instrument at two levels: a root span per MCP request, and child spans per tool call, for granular latency visibility.
- Correlate logs to traces by attaching
traceIdandspanIdas structured log fields. - Emit counter and histogram metrics per tool name so dashboards and alerts can be tool-specific, not just server-wide.