Skip to main content
Security for Application Engineers

Logging Without Leaks

Ravinder··7 min read
SecurityAppSecLoggingObservabilityPII
Share:
Logging Without Leaks

Logs are the first thing you reach for during an incident. They are also a chronic source of data leaks: PII embedded in request logs, API keys printed in stack traces, session tokens written to debug output, full request bodies logged during a temporary debugging session that was never cleaned up.

The two goals — complete observability and zero sensitive data in logs — feel contradictory. They are not. The solution is structured logging with explicit field selection, redaction at the logger level, and a clear team convention for what is safe to log.

Why Logs Leak

The proximate cause is almost always convenience: logger.debug(request.body), logger.error(e, exc_info=True) when the exception contains a user record, console.log(headers) when the headers contain an Authorization token.

The structural cause is logging strings instead of structured events. When you log a formatted string, you cannot later strip a field — it is embedded in the message. When you log structured key-value pairs, you can filter or redact specific fields at the logger, the shipper, or the aggregator.

import logging, structlog
 
# BAD — string formatting makes the PII impossible to filter downstream
logger.info(f"User login: {user.email} from IP {request.remote_addr}")
 
# GOOD — structured event; each field can be filtered or redacted
structlog.get_logger().info(
    "user.login",
    user_id=user.id,          # internal ID, not email
    ip_hash=hash_ip(request.remote_addr),  # hashed, not raw
    success=True,
)

The Sensitive Data Taxonomy

Define for your team what falls into each category:

Category Examples Log?
Safe identifiers User ID (UUID), request ID, tenant ID Yes
Safe metrics Status codes, latencies, error types Yes
Pseudonymous Hashed IP, hashed email Yes (with caveats)
Regulated PII Raw email, name, phone, SSN, DOB No
Credentials Passwords, tokens, API keys, cookies Never
Payment data Card numbers, CVV, bank account Never
Health data Diagnoses, prescriptions (HIPAA) Never

This taxonomy belongs in your engineering handbook and should be referenced in code review.

Redaction at the Logger Level

Do not rely on engineers to manually avoid sensitive fields in every log call. Implement redaction in the logging pipeline so sensitive fields are stripped before reaching any output.

import structlog, re
from typing import Any
 
REDACT_KEYS = frozenset({
    "password", "token", "secret", "api_key", "authorization",
    "cookie", "credit_card", "ssn", "cvv", "access_token",
    "refresh_token", "private_key",
})
 
REDACT_PATTERNS = [
    re.compile(r'\b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}\b'),  # email
    re.compile(r'\b\d{4}[- ]?\d{4}[- ]?\d{4}[- ]?\d{4}\b'),               # card
]
 
def redact_processor(logger, method, event_dict: dict) -> dict:
    """structlog processor — redact sensitive keys and patterns."""
    for key in list(event_dict.keys()):
        if key.lower() in REDACT_KEYS:
            event_dict[key] = "[REDACTED]"
 
    # Also scan string values for pattern matches
    for key, value in event_dict.items():
        if isinstance(value, str):
            for pattern in REDACT_PATTERNS:
                value = pattern.sub("[REDACTED]", value)
            event_dict[key] = value
 
    return event_dict
 
structlog.configure(
    processors=[
        structlog.processors.TimeStamper(fmt="iso"),
        structlog.stdlib.add_log_level,
        redact_processor,                          # applied to every log event
        structlog.processors.JSONRenderer(),
    ]
)

Request/Response Logging: What to Include

Logging every request and response is useful for debugging. Logging the full body is almost always wrong.

flowchart LR subgraph Request["Inbound Request — safe to log"] RM["Method"] RP["Path (strip query params with credentials)"] RID["X-Request-ID"] UA["User-Agent"] SC["Status Code (response)"] LAT["Latency (ms)"] UID["Authenticated User ID"] end subgraph Danger["Do NOT log"] QS["Query string containing tokens"] AH["Authorization header value"] RB["Request body (may contain PII)"] RSP["Response body (may contain PII)"] CK["Cookie values"] PW["Any credential field"] end

Middleware that logs requests safely:

from fastapi import Request, Response
from starlette.middleware.base import BaseHTTPMiddleware
import structlog, time, re
 
log = structlog.get_logger()
 
# Strip known sensitive query params before logging
SENSITIVE_PARAMS = {"token", "key", "secret", "password", "access_token"}
 
def sanitize_url(url: str) -> str:
    from urllib.parse import urlparse, urlencode, parse_qs, urlunparse
    parsed = urlparse(url)
    params = parse_qs(parsed.query)
    safe_params = {
        k: (["[REDACTED]"] if k.lower() in SENSITIVE_PARAMS else v)
        for k, v in params.items()
    }
    return urlunparse(parsed._replace(query=urlencode(safe_params, doseq=True)))
 
class RequestLoggingMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request: Request, call_next) -> Response:
        start = time.perf_counter()
        response = await call_next(request)
        duration_ms = (time.perf_counter() - start) * 1000
 
        log.info(
            "http.request",
            method=request.method,
            path=sanitize_url(str(request.url)),
            status=response.status_code,
            duration_ms=round(duration_ms, 2),
            request_id=request.headers.get("x-request-id"),
            user_id=getattr(request.state, "user_id", None),
        )
        return response

Exception Logging: Safe Stack Traces

Stack traces are invaluable for debugging and frequently contain sensitive data: ORM exceptions that include query parameters, third-party SDK exceptions that echo API responses, framework errors that print the full request context.

import structlog, traceback
 
log = structlog.get_logger()
 
def safe_exception_log(exc: Exception, context: dict = None):
    """Log an exception with stack trace but scrub the message."""
    tb_lines = traceback.format_exception(type(exc), exc, exc.__traceback__)
    # The exception message (last line) may contain sensitive data
    # Log the type and traceback separately from the message
    log.error(
        "unhandled_exception",
        exc_type=type(exc).__name__,
        # Log traceback lines without the final exception message
        traceback="".join(tb_lines[:-1]).strip(),
        **(context or {}),
    )

Review your exception handling middleware to ensure it does not echo the full exception message into the response body in production — that is both an information disclosure vulnerability and a logging problem.

Log Aggregation and Access Control

Logs that contain even pseudonymous data need access controls on the aggregation side:

# Example: Datadog log pipeline filter — drop PII before indexing
# (configured in Datadog UI or via API)
# Sensitive data scanner: built-in rules for email, SSN, credit card
 
# Self-hosted: Fluent Bit filter to redact before forwarding
[FILTER]
    Name      grep
    Match     *
    Exclude   log  .*password.*
 
[FILTER]
    Name      lua
    Match     *
    script    redact.lua
    call      redact_sensitive

Limit log query access by role: developers can query application logs for their own service; security and compliance teams have broader access; no one has access to raw logs outside a controlled workflow during an investigation.

Audit Logs vs Application Logs

Audit logs record who did what to what, for compliance and forensic purposes. They have different requirements from application logs:

Property Application Logs Audit Logs
Mutability Can be overwritten Append-only, tamper-evident
Retention 30–90 days typical 1–7 years (compliance)
Content Debug context Actor, action, resource, outcome
Access Dev/ops teams Compliance, legal, security
def write_audit_log(
    actor_id: str,
    action: str,
    resource_type: str,
    resource_id: str,
    outcome: str,
    metadata: dict = None,
):
    """Write to a separate, append-only audit log table."""
    db.execute("""
        INSERT INTO audit_log (actor_id, action, resource_type, resource_id, outcome, metadata, created_at)
        VALUES (%s, %s, %s, %s, %s, %s, NOW())
    """, (actor_id, action, resource_type, resource_id, outcome,
          json.dumps(metadata or {})))

Use PostgreSQL's row security policies or a separate database with write-only access from the application to prevent application-level tampering with audit records.

Key Takeaways

  • Structured logging is a prerequisite for safe logs — you cannot redact a field that is embedded in a format string.
  • Implement redaction at the logger level as a processor applied to every event, not as a discipline enforced by individual engineers in every log call.
  • Define and publish a sensitive data taxonomy for your team; reference it in code review so the conversation about what is safe to log is structured, not ad hoc.
  • Request logging should include method, path (sanitized), status, latency, and request ID — never Authorization headers, cookie values, request bodies, or response bodies by default.
  • Exception messages frequently contain sensitive data; log the exception type and traceback separately from the message, and never echo exception details in API responses.
  • Audit logs have fundamentally different requirements from application logs — append-only, longer retention, stricter access control, and a different schema focused on actor/action/resource.
Share: