Routing & Resilience

Routing overview

Routing is how a request becomes a response. In Meridian Blue, the caller declares an ordered models chain; the router walks that chain in strict order, retries retryable failures inside each entry, and only advances to the next entry when the current one is exhausted.

Routing pipeline

Every chat-completion request walks the same pipeline. Earlier steps are fail-closed: a 4xx at any step halts processing and returns immediately.

  1. Auth — Validate the API key, fetch the owning user, and check spend caps. See Authentication.
  2. Rate limit — Per-key sliding-window check (Redis-backed when configured). 429 on excess.
  3. Daily limit — Free-tier multimodal cap (5/day default). 429 on excess.
  4. Model chain validation — Resolve every entry in models[] to an active mapping; reject unknown / inactive / capability-mismatched models with 400/409/422.
  5. Tier gates — Check the user's tier features (maxModelChainLength, providerWhitelist, modelAccess).
  6. Risk classification — Run Article 5 / Annex III detection on the prompt + LLM-as-a-judge validation. Block prohibited rules; flag high-risk for review.
  7. Walk the chain — Try entries in order with retries (see below).
  8. Bill — Debit the credit cost; record a Conversation row with token counts, cost, and provider attempt history.
  9. Audit — Append entries to the audit log and (when policy demands) the forensic vault.
  10. Respond — Return the OpenAI-shaped body plus the Meridian extensions and headers.

Strict chain walk

The chain is walked in the order the caller supplied it. Meridian Blue never substitutes a model the caller didn't list.

  1. Single-model chains run single-shot. If the chain has exactly one entry, the router does not retry the same provider on retryable failures — the caller asked for one model and gets one attempt. Fail → return 502.
  2. Multi-model chains retry per entry. Each entry gets up to MAX_RETRIES_PER_PROVIDER + 1 attempts with exponential backoff (base 500ms, max 4 s, jittered) before advancing to the next entry.
  3. 4xx halts the chain. Non-retryable client errors (400, 422, …) are returned immediately. Retryable errors (429, 5xx, network) eat into the retry budget.
  4. Circuit breaker. Per-mapping failure counters. A mapping that's currently broken is filtered out of selection until the cool-down expires.

Per-model mappings

A model name in the catalogue can resolve to multiple mappings — for example two regions of the same Azure OpenAI deployment, or the same model behind two different keys. When that happens, Meridian Blue picks one by weighted random selection (each mapping has a config.weight; default 1) and the circuit breaker filters out unhealthy ones first. See Load balancing.

Observability

Routing decisions are exported as Prometheus metrics (meridian_requests_total, meridian_fallbacks_total, meridian_request_latency_ms, meridian_provider_attempts_total) and as structured Conversation log rows. See Observability.

Failure modes

  • Unknown model400 model_not_found.
  • Lifecycle mismatch409 when the requested model is in maintenance or deprecated state.
  • Capability mismatch422 when the prompt requires a capability the model lacks (e.g. images on a text-only model). Note that single-modality strips happen automatically; 422 is reserved for hard mismatches.
  • Chain exhausted502 all_providers_failed with the full provider_attempts array in the body.
  • Tier gate403 chain_length_exceeded if the chain is longer than the user's tier permits.