Routing & Resilience

Fallback chains

Pass models: [primary, fallback1, fallback2, ...] on any chat-completion request and Meridian Blue walks the list strictly in order, stopping at the first 200 OK. Failures inside a chain entry exhaust the retry budget there before advancing.

Why fallback chains

Upstream providers fail. They go down, they degrade, they hit rate limits, they reject requests. A fallback chain is the deterministic, declarative answer to "what should happen instead?" — chosen by the caller, never silently substituted by the gateway.

Defining a chain

Pass models (an ordered string array) on the request. The router never substitutes models that aren't in this list.

JSON

{
  "models": [
    "gpt-4o",           // primary
    "claude-sonnet-4-5-20241022", // fallback 1
    "llama-3.3-70b-versatile" // fallback 2
  ],
  "messages": [...]
}

Both the singular model and the array models are accepted. When both are present, models wins. If only model is supplied it is normalised to a one-entry chain (single-shot — no retries; see Smart retries → Single-shot).

How the chain is walked

For each chain entry in order:
1. Pick a healthy mapping (circuit-breaker filter + weighted random).
2. Try up to MAX_RETRIES_PER_PROVIDER + 1 attempts with exponential backoff.
3. On a non-retryable error or after retries are spent, advance to the next entry.
The first 200 OK wins — return the response with the full attempt history attached.

When the chain stops

Success — first 200 OK. Response includes billing.isFallback and the X-Meridian-Fallback* headers describing what happened.
Hard 4xx (not retryable) — any non-retryable client error halts the chain immediately and is returned as-is.
Chain exhausted — every entry tried and failed → 502 all_providers_failed with provider_attempts in the body.

Observability

Every fallback that succeeded fires the fallback.triggered webhook (containing the chain that was walked + the winning provider). Every fully-exhausted chain fires providers.exhausted. See Webhooks. Also exposed as Prometheus counters: meridian_fallbacks_total and meridian_provider_attempts_total{status="failed"}.

Common patterns

Pattern	Why
`[premium, mid-tier, free]`	Cost-optimised cascade — pay top-shelf prices only when the cheap providers are unavailable.
`[provider-A, provider-B]` (same model class)	Vendor diversification for resilience.
`[same-model, same-model]`	Force a single-model request to be retried instead of failing once.
`[gpt-4o, gpt-4o-mini]`	Quality cascade within a vendor — fall back to the smaller variant on overload.

Cookie	Purpose	Duration
`_ga`	Distinguishes unique visitors	2 years
`_gid`	Distinguishes unique visitors	24 hours
`_ga_*`	Maintains session state	2 years