Routing & Resilience

Fallback chains

Pass models: [primary, fallback1, fallback2, ...] on any chat-completion request and Meridian Blue walks the list strictly in order, stopping at the first 200 OK. Failures inside a chain entry exhaust the retry budget there before advancing.

Why fallback chains

Upstream providers fail. They go down, they degrade, they hit rate limits, they reject requests. A fallback chain is the deterministic, declarative answer to "what should happen instead?" — chosen by the caller, never silently substituted by the gateway.

Defining a chain

Pass models (an ordered string array) on the request. The router never substitutes models that aren't in this list.

JSON
{
  "models": [
    "gpt-4o",           // primary
    "claude-sonnet-4-5-20241022", // fallback 1
    "llama-3.3-70b-versatile" // fallback 2
  ],
  "messages": [...]
}

Both the singular model and the array models are accepted. When both are present, models wins. If only model is supplied it is normalised to a one-entry chain (single-shot — no retries; see Smart retries → Single-shot).

How the chain is walked

  1. For each chain entry in order:
    1. Pick a healthy mapping (circuit-breaker filter + weighted random).
    2. Try up to MAX_RETRIES_PER_PROVIDER + 1 attempts with exponential backoff.
    3. On a non-retryable error or after retries are spent, advance to the next entry.
  2. The first 200 OK wins — return the response with the full attempt history attached.

When the chain stops

  • Success — first 200 OK. Response includes billing.isFallback and the X-Meridian-Fallback* headers describing what happened.
  • Hard 4xx (not retryable) — any non-retryable client error halts the chain immediately and is returned as-is.
  • Chain exhausted — every entry tried and failed → 502 all_providers_failed with provider_attempts in the body.

Observability

Every fallback that succeeded fires the fallback.triggered webhook (containing the chain that was walked + the winning provider). Every fully-exhausted chain fires providers.exhausted. See Webhooks. Also exposed as Prometheus counters: meridian_fallbacks_total and meridian_provider_attempts_total{status="failed"}.

Common patterns

PatternWhy
[premium, mid-tier, free]Cost-optimised cascade — pay top-shelf prices only when the cheap providers are unavailable.
[provider-A, provider-B] (same model class)Vendor diversification for resilience.
[same-model, same-model]Force a single-model request to be retried instead of failing once.
[gpt-4o, gpt-4o-mini]Quality cascade within a vendor — fall back to the smaller variant on overload.