Fallback chains
Pass models: [primary, fallback1, fallback2, ...] on any chat-completion request and Meridian Blue walks the list strictly in order, stopping at the first 200 OK. Failures inside a chain entry exhaust the retry budget there before advancing.
Why fallback chains
Upstream providers fail. They go down, they degrade, they hit rate limits, they reject requests. A fallback chain is the deterministic, declarative answer to "what should happen instead?" — chosen by the caller, never silently substituted by the gateway.
Defining a chain
Pass models (an ordered string array) on the request. The router never substitutes models that aren't in this list.
{
"models": [
"gpt-4o", // primary
"claude-sonnet-4-5-20241022", // fallback 1
"llama-3.3-70b-versatile" // fallback 2
],
"messages": [...]
}
Both the singular model and the array models are accepted. When both are present, models wins. If only model is supplied it is normalised to a one-entry chain (single-shot — no retries; see Smart retries → Single-shot).
How the chain is walked
- For each chain entry in order:
- Pick a healthy mapping (circuit-breaker filter + weighted random).
- Try up to
MAX_RETRIES_PER_PROVIDER + 1attempts with exponential backoff. - On a non-retryable error or after retries are spent, advance to the next entry.
- The first 200 OK wins — return the response with the full attempt history attached.
When the chain stops
- Success — first 200 OK. Response includes
billing.isFallbackand theX-Meridian-Fallback*headers describing what happened. - Hard 4xx (not retryable) — any non-retryable client error halts the chain immediately and is returned as-is.
- Chain exhausted — every entry tried and failed →
502 all_providers_failedwithprovider_attemptsin the body.
Observability
Every fallback that succeeded fires the fallback.triggered webhook (containing the chain that was walked + the winning provider). Every fully-exhausted chain fires providers.exhausted. See Webhooks. Also exposed as Prometheus counters: meridian_fallbacks_total and meridian_provider_attempts_total{status="failed"}.
Common patterns
| Pattern | Why |
|---|---|
[premium, mid-tier, free] | Cost-optimised cascade — pay top-shelf prices only when the cheap providers are unavailable. |
[provider-A, provider-B] (same model class) | Vendor diversification for resilience. |
[same-model, same-model] | Force a single-model request to be retried instead of failing once. |
[gpt-4o, gpt-4o-mini] | Quality cascade within a vendor — fall back to the smaller variant on overload. |