Engineering

The CI Pipeline That Finishes in 3 Minutes

Ravinder·December 2, 2025·8 min read

EngineeringCI/CDDevOpsPerformance

The CI Pipeline That Finishes in 3 Minutes

Twenty-five minutes. That was our median CI time before we fixed it. Not the worst in the industry — I've seen 45-minute pipelines run without irony — but long enough that engineers had learned to open Slack while waiting, and long enough that a second failing run meant the better part of an hour burned. The feedback loop was broken in the exact way that makes people stop caring about test failures.

We got it to under 3 minutes. Not by removing tests, not by skipping things that mattered. By building the pipeline the way you'd build a production system: with a dependency graph, a caching strategy, and a cost model. This post is a practitioner's walkthrough of every technique we used, with the decisions that actually drove the numbers.

Why CI Is Slow: The Real Diagnosis

Most slow pipelines share the same three sins: they reinstall dependencies from scratch on every run, they run everything serially, and they run everything regardless of what changed. Fix all three and 3 minutes is achievable for most mid-size codebases.

flowchart LR subgraph Before A1[Install deps\n8 min] --> B1[Build\n5 min] --> C1[Lint\n3 min] --> D1[Unit tests\n6 min] --> E1[Integration\n4 min] end subgraph After A2[Restore dep cache\n30s] --> P1 subgraph P1[Parallel] direction TB B2[Build\n45s] C2[Lint\n40s] end P1 --> P2 subgraph P2[Sharded tests] direction TB D2[Unit shard 1\n50s] E2[Unit shard 2\n50s] F2[Integration\n50s] end end

The before/after is stark. The same work, restructured, runs in roughly one-seventh of the time. Let me show you the mechanics of each layer.

Layer 1: Dependency Caching Done Right

The most common caching mistake is caching the output of package installation but keying the cache on the entire lockfile. That means any dependency change — even a patch version bump in a transitive dep — busts the cache entirely and forces a full reinstall. You want a more surgical key.

Here's the pattern we use in GitHub Actions for a Node.js monorepo:

- name: Cache node_modules
  uses: actions/cache@v4
  with:
    path: |
      ~/.npm
      node_modules
      **/node_modules
    key: ${{ runner.os }}-npm-${{ hashFiles('**/package-lock.json') }}
    restore-keys: |
      ${{ runner.os }}-npm-

The restore-keys fallback is the critical part that most guides skip. When the exact lockfile hash misses, CI restores the closest prior cache and then runs npm ci on top — which only fetches the delta. A typical dependency update that touches 3 packages now takes 20 seconds instead of 4 minutes.

For Python with pip:

- name: Cache pip
  uses: actions/cache@v4
  with:
    path: ~/.cache/pip
    key: ${{ runner.os }}-pip-${{ hashFiles('requirements*.txt') }}
    restore-keys: |
      ${{ runner.os }}-pip-
 
- name: Install dependencies
  run: pip install --prefer-binary -r requirements.txt

The --prefer-binary flag tells pip to prefer pre-built wheels over source builds. Combined with the cache, a fully warm run takes under 15 seconds.

Layer 2: Build Caching and Incremental Compilation

Dependency caching saves you on install. Build caching saves you on compilation. These are different problems.

For TypeScript, the tsconfig incremental flag persists .tsbuildinfo files that let the compiler skip unchanged modules:

{
  "compilerOptions": {
    "incremental": true,
    "tsBuildInfoFile": ".buildcache/tsconfig.tsbuildinfo",
    "composite": true
  }
}

Cache the .buildcache directory with a key that covers your source files:

- name: Cache TypeScript build
  uses: actions/cache@v4
  with:
    path: .buildcache
    key: ${{ runner.os }}-tsc-${{ hashFiles('src/**/*.ts', 'tsconfig*.json') }}
    restore-keys: |
      ${{ runner.os }}-tsc-

For a 60-KLOC codebase, this drops a cold TypeScript build from 90 seconds to under 15 on a warm cache hit with partial changes.

Docker layer caching deserves its own mention. If your CI builds images, use --cache-from pointing at your registry:

- name: Build Docker image
  run: |
    docker buildfile --cache-from $IMAGE:cache \
      --build-arg BUILDKIT_INLINE_CACHE=1 \
      -t $IMAGE:$SHA .
    docker push $IMAGE:$SHA
    docker tag $IMAGE:$SHA $IMAGE:cache
    docker push $IMAGE:cache

The cache tag is always the last successful build. Each run only rebuilds layers whose inputs changed.

Layer 3: Parallelism — Actual Parallelism, Not Fake Parallelism

GitHub Actions needs defines a dependency DAG. Most pipelines use it to create a chain (lint → build → test) when lint and build actually have no dependency on each other.

jobs:
  deps:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/cache@v4
        with:
          path: node_modules
          key: ${{ runner.os }}-npm-${{ hashFiles('package-lock.json') }}
      - run: npm ci
 
  lint:
    needs: deps
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/cache@v4
        # same cache key as deps
      - run: npm run lint
 
  typecheck:
    needs: deps
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/cache@v4
      - run: npm run typecheck
 
  test:
    needs: deps
    strategy:
      matrix:
        shard: [1, 2, 3]
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/cache@v4
      - run: npm test -- --shard=${{ matrix.shard }}/3

Lint, typecheck, and all test shards run concurrently once the dep cache is warmed. The critical path is now: dep restore (30s) + slowest shard (50s) = 80 seconds of wall time.

Layer 4: Test Sharding

Sharding splits your test suite across N parallel runners. The trick is splitting intelligently — by timing, not by file count. Most test runners support this natively.

Jest:

# Runner 1 of 3
jest --shard=1/3 --ci --forceExit
 
# Runner 2 of 3
jest --shard=2/3 --ci --forceExit

Pytest with pytest-split:

# First, store timing data
pytest --store-durations --durations-path=.test-durations.json
 
# Then shard by duration
pytest --splits=3 --group=1 --durations-path=.test-durations.json

The duration-aware split is important. Naive file-count splits produce wildly uneven shards if you have a few slow tests and many fast ones. Duration-aware splits target equal wall-clock time per shard, which is what you actually care about.

For a suite that was 6 minutes single-threaded, 3 shards with duration splitting brings it to about 2:10. Add a fourth shard and you're around 1:40, with diminishing returns after that because test setup overhead dominates.

Layer 5: Change-Based Skip Logic

This is the most leverage, and the most dangerous if done carelessly. The principle: if your change only touches the docs directory, don't run the test suite. If it only touches one microservice, only test that service.

GitHub Actions path filters:

on:
  pull_request:
    paths:
      - 'src/**'
      - 'tests/**'
      - 'package*.json'
 
jobs:
  detect-changes:
    runs-on: ubuntu-latest
    outputs:
      backend: ${{ steps.filter.outputs.backend }}
      frontend: ${{ steps.filter.outputs.frontend }}
    steps:
      - uses: actions/checkout@v4
      - uses: dorny/paths-filter@v3
        id: filter
        with:
          filters: |
            backend:
              - 'services/**'
              - 'packages/shared/**'
            frontend:
              - 'apps/web/**'
              - 'packages/ui/**'
 
  test-backend:
    needs: detect-changes
    if: needs.detect-changes.outputs.backend == 'true'
    # ...
 
  test-frontend:
    needs: detect-changes
    if: needs.detect-changes.outputs.frontend == 'true'
    # ...

The risk: your change graph must be correct. If packages/shared affects both backend and frontend, it must be in both filter lists. An incorrect skip is worse than no skip — it gives false confidence. Audit your filters quarterly.

For monorepos with Nx or Turborepo, the tooling handles this automatically:

# Nx: only run affected tests
npx nx affected --target=test --base=origin/main
 
# Turborepo: pipeline-aware incremental builds
npx turbo run test --filter=[HEAD^1]

The Performance Budget

Treat your pipeline time as a budget with an enforced ceiling. Add a step that fails the pipeline if it exceeds your threshold:

- name: Check pipeline duration
  if: always()
  run: |
    START="${{ steps.pipeline-start.outputs.time }}"
    NOW=$(date +%s)
    ELAPSED=$((NOW - START))
    BUDGET=180  # 3 minutes
    if [ "$ELAPSED" -gt "$BUDGET" ]; then
      echo "Pipeline exceeded ${BUDGET}s budget (took ${ELAPSED}s)"
      exit 1
    fi

This sounds aggressive, but without a hard ceiling, pipelines drift. Every new test suite gets added, every new check gets bolted on, and 3 minutes becomes 5 becomes 8 before anyone notices. A budget creates accountability.

What We Skipped and Why

Some obvious suggestions we deliberately did not implement:

Skipping tests on main-only branches: Never. Main always gets full coverage. The skip logic only applies to feature branch PRs.

Running only changed-file unit tests: The dependency graph between files is too fragile. A utility function change can break tests in unexpected places. File-level skipping requires perfect import analysis — we use module-level skipping instead.

Removing integration tests: They catch things unit tests structurally cannot. We made them faster (parallel, ephemeral databases via Testcontainers) rather than removing them.

Key Takeaways

Cache dependencies at three layers: package manager cache, build artifacts, and Docker layers. Key each cache on the precise inputs that invalidate it, and use restore-keys fallbacks for partial hits.
Model your pipeline as a DAG and run every independent job in parallel. Lint, typecheck, and test shards that have no true dependencies on each other should never run serially.
Shard test suites by duration, not file count. Even three shards can cut test wall-clock time by 60-70% when balanced correctly.
Change-based skip logic is high leverage but must be maintained. Incorrect skip rules produce false confidence; audit your path filters as the codebase evolves.
Enforce a pipeline time budget as a failing check. Without a hard ceiling, pipeline time only moves in one direction.
Never sacrifice meaningful signal for speed. Fast CI that hides bugs is worse than slow CI that catches them.