Skip to main content
Engineering

Progressive Delivery: Canary, Blue/Green, Ring Deployments

Ravinder··9 min read
EngineeringDeploymentCanaryBlue-GreenDevOps
Share:
Progressive Delivery: Canary, Blue/Green, Ring Deployments

"Deploy with confidence" is the sales pitch for every CD tool ever made. What that actually means in practice is: reduce the blast radius of a bad deployment to something your team can recover from in minutes, not hours. Progressive delivery is the set of patterns that makes this possible. Canary, blue/green, and ring deployments are the three dominant models — each solves a different problem, and choosing the wrong one adds complexity without reducing risk.

This post compares the three models against a concrete set of criteria: traffic control granularity, infrastructure cost, rollback speed, and the kind of bugs each model catches. Then it covers tooling (Argo Rollouts and Spinnaker), automated rollback configuration, and the promotion metrics that actually matter.

The Three Models, Side by Side

graph TD subgraph Canary A1[Load Balancer] -->|95% traffic| B1[v1 Pods] A1 -->|5% traffic| C1[v2 Pods canary] end subgraph Blue-Green A2[Load Balancer] -->|100% traffic| B2[Blue - v1 Active] C2[Green - v2 Idle] B2 -.->|switch| A2 C2 -.->|becomes active| A2 end subgraph Ring A3[Internal Ring] -->|v2| B3[Employees] D3[Beta Ring] -->|v2| C3[Opted-in users] E3[Production Ring] -->|v1 → v2| F3[All users] end

Canary deployment routes a percentage of live traffic to the new version. You start at 1-5%, monitor, and increment. The new version is running alongside the old. Users are split — some get v1, some get v2, and they do not choose.

Blue/green deployment maintains two identical environments. The "blue" environment is live. You deploy the new version to "green," run smoke tests, then switch the load balancer to point at green. The switch is atomic. If something goes wrong, you flip back.

Ring deployment (also called ring-based rollout or flight rollout) progressively expands the audience of the new version across concentric rings — from internal employees to beta users to a geographic region to the full user base. Each ring is a discrete audience segment, not a traffic percentage.

When to Use Each Model

Criterion Canary Blue/Green Ring
Rollback speed Seconds (shift traffic back) Seconds (flip LB) Minutes (depends on ring size)
Infrastructure cost Low (extra pods only) 2x (full duplicate env) Low to medium
Catches data migration bugs Poor Poor Poor (use dark launches for this)
Catches performance regressions Excellent Moderate Excellent
Catches user-segment-specific bugs Moderate No Excellent
Suitable for stateful services Moderate Good Good
Complexity to implement Medium Low High

Choose canary when:

  • You want fine-grained traffic control and can instrument the comparison between v1 and v2 populations.
  • Your service is stateless or you can tolerate a user hitting both versions within a session.
  • You want to automate promotion based on real traffic metrics.

Choose blue/green when:

  • You need atomic, instant rollback — and you are willing to pay 2x infrastructure cost for the standby environment.
  • Your application takes a long time to warm up (caches, JVM JIT) and you want to warm green before switching.
  • Your deployment pipeline already validates green in isolation before any live traffic hits it.

Choose ring deployments when:

  • You have distinct audience segments that make sense as risk boundaries (employees, beta users, geography).
  • You want to test with consenting users who can report issues before a general rollout.
  • You have a product that has strong regional or segment-specific behavior.

Ring deployments are not a substitute for canary in most infrastructure contexts — they are a product and audience control layer on top of deployment infrastructure.

Argo Rollouts: Canary Configuration

Argo Rollouts extends Kubernetes with Rollout resources that replace Deployment objects. It manages the canary traffic split natively via integration with ingress controllers or service meshes.

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: payment-service
spec:
  replicas: 10
  strategy:
    canary:
      # Pause at each step for analysis
      steps:
        - setWeight: 5     # 5% of traffic to canary
        - pause: { duration: 5m }
        - analysis:
            templates:
              - templateName: error-rate-check
        - setWeight: 25
        - pause: { duration: 10m }
        - analysis:
            templates:
              - templateName: error-rate-check
              - templateName: latency-check
        - setWeight: 50
        - pause: { duration: 15m }
        - setWeight: 100
      canaryService: payment-service-canary
      stableService: payment-service-stable
      trafficRouting:
        nginx:
          stableIngress: payment-service-ingress

The analysis steps are the automation glue. An AnalysisTemplate queries your metrics backend (Prometheus, Datadog, etc.) and promotes or aborts based on the result:

apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: error-rate-check
spec:
  metrics:
    - name: error-rate
      successCondition: result[0] <= 0.02   # abort if error rate > 2%
      failureLimit: 1
      interval: 1m
      count: 5
      provider:
        prometheus:
          address: http://prometheus.monitoring:9090
          query: |
            sum(rate(http_requests_total{
              job="payment-service",
              version="{{args.version}}",
              status=~"5.."
            }[2m]))
            /
            sum(rate(http_requests_total{
              job="payment-service",
              version="{{args.version}}"
            }[2m]))
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
  name: latency-check
spec:
  metrics:
    - name: p99-latency
      successCondition: result[0] <= 0.5   # abort if p99 > 500ms
      failureLimit: 1
      interval: 1m
      count: 5
      provider:
        prometheus:
          address: http://prometheus.monitoring:9090
          query: |
            histogram_quantile(0.99,
              sum(rate(http_request_duration_seconds_bucket{
                job="payment-service",
                version="{{args.version}}"
              }[5m])) by (le)
            )

When the analysis step fails, Argo Rollouts automatically shifts 100% of traffic back to the stable version and marks the rollout as degraded. No manual intervention required.

Blue/Green With Argo Rollouts

The same Rollout CRD supports blue/green with a different strategy block:

apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
  name: payment-service
spec:
  replicas: 10
  strategy:
    blueGreen:
      activeService: payment-service-active
      previewService: payment-service-preview
      autoPromotionEnabled: false     # require manual gate or analysis
      scaleDownDelaySeconds: 300      # keep blue alive 5 minutes post-switch for rollback
      prePromotionAnalysis:
        templates:
          - templateName: smoke-test
      postPromotionAnalysis:
        templates:
          - templateName: error-rate-check

The prePromotionAnalysis runs against the preview environment (green) before any live traffic is switched. The postPromotionAnalysis runs after the switch and triggers automatic rollback if it fails within the window defined by scaleDownDelaySeconds.

sequenceDiagram participant CD as CD Pipeline participant Argo as Argo Rollouts participant Green as Green Env participant LB as Load Balancer participant Blue as Blue Env Active CD->>Argo: Apply new Rollout spec Argo->>Green: Deploy v2 to preview service Argo->>Green: Run prePromotionAnalysis (smoke tests) Green-->>Argo: Analysis passed Argo->>LB: Switch active service → Green Argo->>Argo: Run postPromotionAnalysis Note over Blue: Blue kept alive 5 minutes Argo-->>CD: Promotion complete alt postPromotionAnalysis fails Argo->>LB: Revert active service → Blue Argo-->>CD: Rollback triggered end

Spinnaker: Multi-Cloud Progressive Delivery

Spinnaker models deployments as pipelines with stages. For canary, it uses Kayenta — an automated canary analysis tool that compares the canary and baseline metric populations statistically.

// Spinnaker pipeline stage: Canary Analysis
{
  "type": "kayentaCanary",
  "canaryConfig": {
    "canaryConfigId": "payment-service-canary-config",
    "lifetimeDuration": "PT1H",
    "scoreThresholds": {
      "pass": 85,
      "marginal": 70
    },
    "scopes": [
      {
        "controlScope": {
          "scope": "payment-service-baseline",
          "region": "us-east-1"
        },
        "experimentScope": {
          "scope": "payment-service-canary",
          "region": "us-east-1"
        }
      }
    ]
  }
}

Kayenta scores the canary against the baseline across all configured metrics and produces a 0-100 score. Below marginal triggers an automatic rollback. Between marginal and pass triggers a manual review gate. Above pass auto-promotes.

Spinnaker's advantage over Argo Rollouts is multi-cloud and multi-cluster support. If you are deploying to EC2 and Kubernetes simultaneously, or across multiple cloud providers, Spinnaker's abstraction layer pays off. If you are Kubernetes-only, Argo Rollouts is lighter and easier to operate.

Metrics That Gate Promotion

Not all metrics should gate promotion. Using too many metrics makes your analysis flaky — individual metric fluctuations cause false rollbacks. Using too few means you miss regressions.

The right set:

Primary gate (any failure = abort):
  1. Error rate > baseline + threshold (e.g., > 2% absolute or > 20% relative increase)
  2. p99 latency > baseline + threshold (e.g., > 500ms or > 30% relative increase)
 
Secondary gate (multiple failures = abort):
  3. p50 latency regression
  4. Downstream error rate (upstream dependency errors caused by this service)
 
Never gate on:
  - Business metrics (conversion, revenue) — too much natural variance
  - Infrastructure metrics (CPU, memory) in isolation — a canary with 50% traffic has 50% of pods
  - Metrics with < 100 events in the analysis window — statistically meaningless

For relative thresholds, the comparison should be canary vs. the stable version receiving equivalent traffic at the same time, not historical averages. This controls for time-of-day effects.

Automated Rollback Without Thrashing

Automated rollback is powerful but can thrash — roll back, redeploy, roll back again — if the analysis window is too short or the thresholds are too tight. Guard against this:

  1. Minimum sample window. Never run analysis on fewer than 5 minutes of data. Set count * interval >= 5m.
  2. Failure tolerance. Set failureLimit: 1 or failureLimit: 2 to allow for transient spikes. Do not trigger rollback on a single anomalous data point.
  3. Circuit breaker on repeated failures. If the same version fails canary analysis three times, block automatic redeployment and require human review.
  4. Rollback notification. Every automated rollback should page the on-call engineer. Rollbacks should never be silent.

Key Takeaways

  • Canary deployment is the right default for most stateless services — it gives granular traffic control and enables automated promotion and rollback against real user traffic.
  • Blue/green is the right choice when you need atomic, instant rollback and can absorb the 2x infrastructure cost; it is simpler to reason about but does not catch per-user-segment bugs.
  • Ring deployments are an audience management pattern, not a traffic management pattern — use them when your audience segments are meaningful risk boundaries.
  • Argo Rollouts AnalysisTemplate objects bridge the gap between deployment orchestration and your metrics backend; configure them with relative thresholds to control for time-of-day effects.
  • Gate canary promotion on error rate and p99 latency relative to the stable version receiving equivalent traffic at the same time — not historical averages.
  • Every automated rollback must generate a notification; silent rollbacks create the false belief that deployments succeeded when they did not.