From Three Pillars to Four
Series
Observability in DepthPart 2 →
Logs: Structured, Sampled, Retained
The "three pillars" framing — logs, metrics, traces — served us well when most systems were monolithic and a single grep could surface a bug. A decade of microservices, serverless, and event-driven architecture later, that model has a gap: discrete business events get shoehorned into log lines, stripped of context, and promptly lost in a flood of DEBUG noise.
This series starts by naming that gap and showing how promoting events to a first-class signal changes the way teams instrument, query, and act on telemetry.
Why Three Pillars Break Under Event-Driven Load
Consider an order-placement flow. A metric tells you orders-per-second. A trace tells you which service was slow. A log tells you INFO: order placed. None of them answer: which SKUs trigger payment failures at 3 × the baseline rate on Tuesdays?
That's an event question. Events carry rich, structured context at the moment something meaningful happens — not sampled aggregates, not free-text descriptions. They are the raw material from which every other signal is derived.
| Signal | Granularity | Cost per unit | Best for |
|---|---|---|---|
| Metric | Aggregate | Very low | Alerting, dashboards |
| Log | Per record | Medium | Debugging free-text context |
| Trace | Per request | High | Latency root-cause |
| Event | Per action | Medium-high | Business behavior, funnels |
The Four-Signal Model
The key architectural decision is giving events their own pipeline. Routing them through your log aggregator works in prototype, but collapses under production cardinality. A dedicated stream (Kafka topic, Kinesis stream, or even a simple HTTP collector) gives you schema enforcement, independent retention, and replay.
Defining an Event Schema
Resist the temptation to emit raw JSON blobs. Agree on a base schema that every service uses:
{
"schema_version": "1.0",
"event_type": "order.placed",
"timestamp": "2025-08-01T09:15:32.411Z",
"trace_id": "4bf92f3577b34da6a3ce929d0e0e4736",
"service": "checkout-api",
"environment": "production",
"actor": {
"user_id": "usr_8823",
"session_id": "sess_abc123"
},
"payload": {
"order_id": "ord_99821",
"total_cents": 4299,
"sku_count": 3,
"payment_method": "card"
}
}trace_id is non-negotiable. It is the join key that lets you pivot from an anomalous event cluster to the traces that explain what happened.
Instrumenting with OpenTelemetry
OTel 1.x treats events as a sub-type of logs, reachable through the EventLogger API (experimental in most SDKs as of mid-2025, but stable in Java and Go):
// Go — OTel EventLogger
import (
"go.opentelemetry.io/otel/log"
"go.opentelemetry.io/otel/log/global"
)
func emitOrderPlaced(ctx context.Context, order Order) {
logger := global.GetLoggerProvider().Logger("checkout-api")
record := log.Record{}
record.SetTimestamp(time.Now())
record.SetEventName("order.placed")
record.AddAttributes(
log.String("order_id", order.ID),
log.Int("total_cents", order.TotalCents),
log.String("payment_method", order.PaymentMethod),
log.String("user_id", order.UserID),
)
logger.Emit(ctx, record)
}For Python services, until the EventLogger API stabilizes, emit via the OTel Logs SDK with a structured body:
import opentelemetry.sdk._logs as sdk_logs
from opentelemetry.sdk._logs.export import SimpleLogRecordProcessor
from opentelemetry._logs import SeverityNumber
def emit_event(ctx, event_type: str, payload: dict):
record = sdk_logs.LogRecord(
timestamp=time_ns(),
trace_id=get_current_span(ctx).get_span_context().trace_id,
severity_number=SeverityNumber.INFO,
body=json.dumps({"event_type": event_type, **payload}),
attributes={"event.name": event_type},
)
provider.get_logger(__name__).emit(record)Routing Events Through an OTel Collector
The Collector is the right place to fan out: keep a copy in your TSDB-friendly format while routing the full payload to your event store.
# otelcol-config.yaml — event routing
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
processors:
filter/events_only:
logs:
include:
match_type: strict
record_attributes:
- key: event.name
value: ".*"
match_type: regexp
batch:
timeout: 5s
send_batch_size: 1000
exporters:
kafka/events:
brokers: ["kafka:9092"]
topic: "telemetry.events"
encoding: otlp_json
loki:
endpoint: http://loki:3100/loki/api/v1/push
service:
pipelines:
logs/events:
receivers: [otlp]
processors: [filter/events_only, batch]
exporters: [kafka/events]
logs/general:
receivers: [otlp]
processors: [batch]
exporters: [loki]Correlating Events with Traces in Grafana
Once events land in a queryable store, you want one-click correlation. In Grafana, configure a derived field on your event datasource pointing to the trace backend:
{
"name": "TraceID",
"matcherType": "label",
"matcherQuery": "trace_id",
"url": "${__value.raw}",
"urlDisplayLabel": "Open in Tempo",
"datasourceUid": "tempo-prod",
"type": "traceql"
}Now every event row in Explore shows an "Open in Tempo" link that jumps directly to the corresponding request trace.
Schema Governance: Preventing Drift
The biggest operational risk with a fourth signal is schema sprawl. Two practices keep it manageable:
- Schema registry — Register every
event_typein Confluent Schema Registry (or a lightweight alternative likebuf). Producers that emit unknown fields get rejected at the Collector layer. - Event catalog — A simple Git-tracked YAML file that documents what each event type means, who owns it, and its retention tier. Treat it like an API contract.
Key Takeaways
- Logs, metrics, and traces cannot answer business-behavior questions at the granularity modern systems require — events fill that gap.
- Events need a dedicated pipeline; routing them through log aggregators creates cardinality and cost problems.
trace_idembedded in every event is the join key for cross-signal correlation.- The OTel EventLogger API (stable in Go/Java) is the forward-looking instrumentation path; structured log emission is a workable interim for other runtimes.
- Schema governance — a registry plus a catalog — is not optional; it is the difference between a queryable event store and a JSON graveyard.
- Grafana derived fields make event-to-trace pivot a one-click operation rather than a manual trace-ID copy-paste.
Series
Observability in DepthPart 2 →
Logs: Structured, Sampled, Retained