Data

Time-Series at Scale: TimescaleDB, ClickHouse, Druid

Ravinder·March 15, 2026·10 min read

DataTime SeriesTimescaleDBClickHouseDruid

Time-Series at Scale: TimescaleDB, ClickHouse, Druid

The term "time-series database" covers a wider range of requirements than most teams realize when they start evaluating options. A metrics store for infrastructure monitoring has almost nothing in common with a real-time analytics platform for user behavior — they share a timestamp column and that's about it.

TimescaleDB, ClickHouse, and Druid are the three serious contenders for large-scale time-series workloads in 2026. They're built on different architectures and win in fundamentally different scenarios. Picking the wrong one means a painful migration six months later when your query patterns don't fit the engine.

Understanding the Problem Space

Before choosing a database, characterize your workload honestly:

graph TD A[Time-Series Workload] --> B{Primary\nquery pattern?} B --> C["Range scans on\nrecent data\n(last 24h, last 7d)"] B --> D["Aggregations over\nlarge historical ranges\n(GROUP BY + COUNT/SUM)"] B --> E["Sub-second slice-and-dice\nwith high cardinality\ndimensions"] C --> F[TimescaleDB] D --> G[ClickHouse] E --> H[Druid] F --> I{Also need\nSQL + JOINs?} I -- Yes --> J[TimescaleDB wins clearly] I -- No --> K{Ingestion\nrate?} K --> L["< 100K events/s"] --> F K --> M["> 100K events/s"] --> G

Cardinality is the dimension that differentiates these systems most sharply. High cardinality means many distinct combinations of dimension values — e.g., one metric per user-device-region combination across millions of users. This is where purpose-built systems diverge from each other dramatically.

TimescaleDB: Postgres With Superpowers

TimescaleDB is Postgres with a partitioning extension. Your data lives in hypertables, which are automatically partitioned into chunks by time. You get the full Postgres feature set — JOINs, foreign keys, full-text search, triggers, extensions — plus time-series-specific optimizations.

When TimescaleDB wins:

You need SQL joins between time-series data and relational data (user profiles, device metadata, etc.)
Your team already knows Postgres
Your cardinality is moderate (thousands to low millions of series)
You need row-level updates and deletes (operational time-series, not just analytics)

-- Create a hypertable partitioned by time
CREATE TABLE sensor_readings (
  time        TIMESTAMPTZ NOT NULL,
  sensor_id   TEXT NOT NULL,
  location    TEXT,
  temperature DOUBLE PRECISION,
  humidity    DOUBLE PRECISION
);
 
SELECT create_hypertable('sensor_readings', 'time', chunk_time_interval => INTERVAL '1 day');
 
-- Add a space partition for high-cardinality sensor_id
SELECT add_dimension('sensor_readings', 'sensor_id', number_partitions => 4);
 
-- Compression policy: compress chunks older than 7 days
ALTER TABLE sensor_readings SET (
  timescaledb.compress,
  timescaledb.compress_segmentby = 'sensor_id',
  timescaledb.compress_orderby = 'time DESC'
);
 
SELECT add_compression_policy('sensor_readings', INTERVAL '7 days');

Compression in TimescaleDB is genuinely impressive. Raw time-series data typically compresses 90–95% with the columnar compression applied to older chunks. A table that takes 500 GB uncompressed often fits in 30–50 GB compressed.

Continuous aggregates are TimescaleDB's materialized view system for time-series:

-- Precompute hourly averages, refreshed automatically
CREATE MATERIALIZED VIEW sensor_hourly
WITH (timescaledb.continuous) AS
SELECT
  time_bucket('1 hour', time) AS bucket,
  sensor_id,
  avg(temperature) AS avg_temp,
  max(temperature) AS max_temp,
  min(temperature) AS min_temp,
  count(*) AS reading_count
FROM sensor_readings
GROUP BY bucket, sensor_id;
 
-- Refresh policy: keep the last 7 days of aggregates fresh
SELECT add_continuous_aggregate_policy('sensor_hourly',
  start_offset => INTERVAL '7 days',
  end_offset   => INTERVAL '1 hour',
  schedule_interval => INTERVAL '1 hour'
);

Where TimescaleDB struggles:

Analytical queries over billions of rows, especially with many GROUP BY dimensions, are slower than ClickHouse
Very high cardinality (tens of millions of series) causes chunk management overhead
Not designed for sub-second real-time ingestion at Kafka-scale rates

ClickHouse: Analytical Engine Built for Volume

ClickHouse is a columnar analytical database designed for high-throughput ingestion and fast aggregation queries over large datasets. It's not a general-purpose database — it doesn't support row-level updates efficiently, doesn't do JOINs well on large tables, and has limited transaction support. What it does is aggregate enormous datasets very fast.

When ClickHouse wins:

You need aggregation queries over hundreds of billions of rows in seconds
Your query patterns are primarily analytical (GROUP BY, COUNT, SUM, percentiles)
Ingestion rates exceed what Postgres can handle (>100K events/s sustained)
High-cardinality dimensions with many filter combinations

-- ClickHouse table with MergeTree engine
CREATE TABLE events
(
    timestamp   DateTime64(3),
    event_type  LowCardinality(String),
    user_id     UInt64,
    session_id  String,
    country     LowCardinality(String),
    properties  String  -- JSON stored as String for flexibility
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(timestamp)
ORDER BY (event_type, user_id, timestamp)  -- primary key = sort key
SETTINGS index_granularity = 8192;
 
-- TTL: automatically delete data older than 90 days
ALTER TABLE events
MODIFY TTL timestamp + INTERVAL 90 DAY DELETE;

The ORDER BY clause in ClickHouse is your index. Data is sorted by this key on disk, and range queries on these columns are fast because they require reading minimal data. The most-filtered columns should be first. This is the primary tuning lever — get it wrong at table creation and you'll need to recreate the table.

LowCardinality is one of ClickHouse's best features. Wrapping a String column with LowCardinality when it has fewer than ~10K distinct values enables dictionary encoding automatically, dramatically improving compression and query speed.

-- Typical analytical query: ClickHouse handles this at billions of rows/second
SELECT
    toStartOfHour(timestamp) AS hour,
    country,
    event_type,
    uniq(user_id) AS unique_users,
    count() AS total_events,
    quantile(0.95)(session_duration_ms) AS p95_duration
FROM events
WHERE timestamp >= now() - INTERVAL 30 DAY
  AND event_type IN ('purchase', 'add_to_cart')
GROUP BY hour, country, event_type
ORDER BY hour DESC, unique_users DESC;

Ingestion patterns: ClickHouse performs best with batched inserts. Individual row inserts cause merge overhead. The recommended pattern is batches of 10K–100K rows:

from clickhouse_driver import Client
from datetime import datetime, timezone
import random
 
client = Client(host='clickhouse-host', port=9000, database='analytics')
 
def ingest_batch(events: list[dict]) -> None:
    """Batch insert into ClickHouse — never insert row by row."""
    rows = [
        (
            datetime.fromtimestamp(e['timestamp'] / 1000, tz=timezone.utc),
            e['event_type'],
            e['user_id'],
            e['session_id'],
            e['country'],
            str(e.get('properties', {})),
        )
        for e in events
    ]
    client.execute(
        'INSERT INTO events (timestamp, event_type, user_id, session_id, country, properties) VALUES',
        rows
    )
 
# From Kafka consumer: accumulate and flush in batches
batch = []
for message in consumer:
    batch.append(message.value)
    if len(batch) >= 50_000:
        ingest_batch(batch)
        batch.clear()

Where ClickHouse struggles:

JOINs on large tables are slow and memory-hungry. Denormalize aggressively.
Exactly-once ingestion requires careful deduplication with ReplicatedReplacingMergeTree
No ACID transactions — concurrent updates are not safe
Operational complexity: cluster management, ZooKeeper (for Replicated tables), shard balancing

Druid: Real-Time Slice-and-Dice

Apache Druid is built for one specific workload: interactive, sub-second slice-and-dice queries on high-cardinality time-series data, with real-time ingestion latency in the seconds range. It's the database behind most real-time dashboards at scale — ad tech, user analytics, operational monitoring.

When Druid wins:

You need query results in under 1 second on billion-row datasets
Real-time ingestion with query visibility within seconds of ingestion
High-cardinality dimensions with arbitrary filter combinations
Dashboard-style workloads with many concurrent users

Druid's architecture is unique: it separates ingestion, storage, and query into distinct node types, and uses pre-aggregation at ingestion time (roll-up) to reduce storage dramatically.

{
  "type": "kafka",
  "spec": {
    "dataSchema": {
      "dataSource": "user_events",
      "timestampSpec": {
        "column": "timestamp",
        "format": "iso"
      },
      "dimensionsSpec": {
        "dimensions": [
          "event_type",
          "user_id",
          "country",
          {"type": "long", "name": "session_id"}
        ]
      },
      "metricsSpec": [
        {"type": "count", "name": "count"},
        {"type": "longSum", "name": "revenue_cents", "fieldName": "revenue_cents"},
        {"type": "HLLSketchBuild", "name": "unique_users", "fieldName": "user_id"}
      ],
      "granularitySpec": {
        "type": "uniform",
        "segmentGranularity": "HOUR",
        "queryGranularity": "MINUTE",
        "rollup": true
      }
    },
    "ioConfig": {
      "type": "kafka",
      "consumerProperties": {"bootstrap.servers": "kafka:9092"},
      "topic": "user-events",
      "taskDuration": "PT1H"
    }
  }
}

The rollup: true setting is Druid's key differentiator. Events with the same timestamp (at query granularity), dimensions, and metrics are pre-aggregated at ingestion time. A dataset with 1 billion raw events might store 50 million pre-aggregated rows. Queries are fast because there's less data to scan.

The tradeoff: you can't recover individual raw events from a rolled-up Druid segment. If your use case requires row-level access, Druid is the wrong choice.

Where Druid struggles:

Operational complexity is high. Druid has 5+ node types (Coordinator, Overlord, Historical, Middlemanager, Broker) plus deep storage (S3/GCS) and ZooKeeper.
SQL support (via Druid's Calcite layer) is improving but still lacks many SQL features that ClickHouse and TimescaleDB support.
Schema changes require re-ingestion. Adding a new dimension to historical data means reprocessing.
Not suitable for operational queries that need row-level precision.

Comparison at a Glance

quadrantChart title Time-Series Database Selection x-axis "Low Query Complexity" --> "High Query Complexity (JOINs, SQL)" y-axis "Moderate Scale" --> "Extreme Scale (100B+ rows)" quadrant-1 ClickHouse territory quadrant-2 Druid territory quadrant-3 Postgres/TimescaleDB territory quadrant-4 Hybrid (TimescaleDB + ClickHouse) TimescaleDB: [0.7, 0.35] ClickHouse: [0.4, 0.8] Druid: [0.2, 0.85] "Plain Postgres": [0.75, 0.15]

Dimension	TimescaleDB	ClickHouse	Druid
SQL compatibility	Full (Postgres)	Good (ANSI subset)	Limited (Calcite)
Real-time ingestion latency	Seconds	Seconds	Seconds
Query speed at 1B+ rows	Moderate	Excellent	Excellent
Cardinality limit	Moderate	High	Very High
JOINs	Excellent	Poor	Poor
Ops complexity	Low	Medium	High
Pre-aggregation / rollup	Via cont. aggregates	Manual	Native
Row-level updates	Yes	No (append-only)	No

Ingestion Architecture Patterns

At scale, all three databases are fed from a stream:

flowchart LR SRC[Applications\nAPIs\nIoT Sensors] --> K[Kafka / Redpanda] K --> D1[Druid\nKafka Supervisor] K --> D2[ClickHouse\nKafka Engine Table] K --> D3[Debezium + Kafka\n→ TimescaleDB] D1 --> Q[Query Layer\nDashboards / APIs] D2 --> Q D3 --> Q

ClickHouse has a native Kafka engine that reads directly from Kafka topics without a separate consumer:

-- ClickHouse Kafka engine table + materialized view pattern
CREATE TABLE events_kafka_source
ENGINE = Kafka
SETTINGS
    kafka_broker_list = 'kafka:9092',
    kafka_topic_list = 'user-events',
    kafka_group_name = 'clickhouse-consumer',
    kafka_format = 'JSONEachRow';
 
-- Materialized view writes from Kafka source to the actual table
CREATE MATERIALIZED VIEW events_mv TO events AS
SELECT
    toDateTime64(JSONExtractFloat(data, 'timestamp') / 1000, 3) AS timestamp,
    JSONExtractString(data, 'event_type') AS event_type,
    JSONExtractUInt(data, 'user_id') AS user_id,
    JSONExtractString(data, 'country') AS country
FROM events_kafka_source;

Key Takeaways

TimescaleDB is the right choice when you need SQL JOINs between time-series and relational data, row-level updates, or a familiar Postgres operational model — it's not the fastest for pure analytics but it's the most versatile.
ClickHouse wins on pure analytical throughput: if your primary workload is large GROUP BY aggregations over historical data and you can denormalize, ClickHouse will be 10–100x faster than TimescaleDB at scale.
Druid is the correct choice for interactive, sub-second dashboards on high-cardinality data with real-time ingestion — its rollup architecture is what makes it feasible, but it also means you lose row-level fidelity.
Cardinality is the key differentiator: TimescaleDB handles thousands of series well, ClickHouse handles millions, and Druid is designed for the workloads where cardinality itself is the challenge.
ClickHouse's ORDER BY clause is its primary index — wrong choices at table creation require full table recreation; get this right before going to production.
Operational complexity scales with capability: TimescaleDB (low), ClickHouse (medium), Druid (high) — factor the ops burden into your selection criteria, not just benchmark numbers.