Skip to main content
← All Series

Series · 10 parts · ~73 min total

Building Production RAG

Before you write a single line of RAG code, you need to be ruthlessly honest about what problem you're solving and whether your data can actually solve it.

  1. 1

    Problem Framing and Dataset Honesty

    Before you write a single line of RAG code, you need to be ruthlessly honest about what problem you're solving and whether your data can actually solve it.

    7 min

    Jan 15, 2025

  2. 2

    Chunking Strategies That Actually Move Recall

    The chunking decision you make in hour one will haunt your recall numbers for months — here's what the tradeoffs actually look like in production.

    7 min

    Jan 22, 2025

  3. 3

    Embedding Choice and Dimensionality

    The embedding model you pick shapes every downstream tradeoff in your RAG pipeline — here's how to choose without regretting it six months later.

    8 min

    Jan 29, 2025

  4. 4

    Hybrid Search: BM25 + Vector

    Pure vector search leaves precision on the table for exact-match queries — here's how to combine lexical and semantic retrieval without making your pipeline a mess.

    8 min

    Feb 5, 2025

  5. 5

    Re-ranking Architectures

    First-stage retrieval gets you candidates — re-ranking is what turns a decent recall number into an answer users actually trust.

    6 min

    Feb 12, 2025

  6. 6

    Caching, Batching, and Cost Control

    A RAG system that works but costs $40K/month to run isn't a product — here's the concrete cost math and the levers that actually move it.

    8 min

    Feb 19, 2025

  7. 7

    Evaluation Harness from Scratch

    A RAG pipeline without an evaluation harness is a system you can only improve by accident — here's how to build the infrastructure that makes intentional progress possible.

    7 min

    Feb 26, 2025

  8. 8

    Observability for Retrieval

    If your RAG system fails silently and you don't know until a user screenshots the bad answer, you don't have observability — here's what to actually instrument.

    7 min

    Mar 5, 2025

  9. 9

    Multi-Tenant Isolation

    Letting multiple customers share a RAG pipeline is an engineering win until one tenant's data leaks into another's answer — here's how to prevent that.

    7 min

    Mar 12, 2025

  10. 10

    Failure Modes and Runbooks

    Every RAG system that runs in production will fail in ways you didn't anticipate — here are the incidents that actually happen and how to resolve them at 2am.

    8 min

    Mar 19, 2025