Routing overview
Routing is how a request becomes a response. In Meridian Blue, the caller declares an ordered models chain; the router walks that chain in strict order, retries retryable failures inside each entry, and only advances to the next entry when the current one is exhausted.
Routing pipeline
Every chat-completion request walks the same pipeline. Earlier steps are fail-closed: a 4xx at any step halts processing and returns immediately.
- Auth — Validate the API key, fetch the owning user, and check spend caps. See Authentication.
- Rate limit — Per-key sliding-window check (Redis-backed when configured). 429 on excess.
- Daily limit — Free-tier multimodal cap (5/day default). 429 on excess.
- Model chain validation — Resolve every entry in
models[]to an active mapping; reject unknown / inactive / capability-mismatched models with 400/409/422. - Tier gates — Check the user's tier features (
maxModelChainLength,providerWhitelist,modelAccess). - Risk classification — Run Article 5 / Annex III detection on the prompt + LLM-as-a-judge validation. Block prohibited rules; flag high-risk for review.
- Walk the chain — Try entries in order with retries (see below).
- Bill — Debit the credit cost; record a Conversation row with token counts, cost, and provider attempt history.
- Audit — Append entries to the audit log and (when policy demands) the forensic vault.
- Respond — Return the OpenAI-shaped body plus the Meridian extensions and headers.
Strict chain walk
The chain is walked in the order the caller supplied it. Meridian Blue never substitutes a model the caller didn't list.
- Single-model chains run single-shot. If the chain has exactly one entry, the router does not retry the same provider on retryable failures — the caller asked for one model and gets one attempt. Fail → return 502.
- Multi-model chains retry per entry. Each entry gets up to
MAX_RETRIES_PER_PROVIDER + 1attempts with exponential backoff (base 500ms, max 4 s, jittered) before advancing to the next entry. - 4xx halts the chain. Non-retryable client errors (400, 422, …) are returned immediately. Retryable errors (429, 5xx, network) eat into the retry budget.
- Circuit breaker. Per-mapping failure counters. A mapping that's currently broken is filtered out of selection until the cool-down expires.
Per-model mappings
A model name in the catalogue can resolve to multiple mappings — for example two regions of the same Azure OpenAI deployment, or the same model behind two different keys. When that happens, Meridian Blue picks one by weighted random selection (each mapping has a config.weight; default 1) and the circuit breaker filters out unhealthy ones first. See Load balancing.
Observability
Routing decisions are exported as Prometheus metrics (meridian_requests_total, meridian_fallbacks_total, meridian_request_latency_ms, meridian_provider_attempts_total) and as structured Conversation log rows. See Observability.
Failure modes
- Unknown model —
400 model_not_found. - Lifecycle mismatch —
409when the requested model is inmaintenanceordeprecatedstate. - Capability mismatch —
422when the prompt requires a capability the model lacks (e.g. images on a text-only model). Note that single-modality strips happen automatically;422is reserved for hard mismatches. - Chain exhausted —
502 all_providers_failedwith the fullprovider_attemptsarray in the body. - Tier gate —
403 chain_length_exceededif the chain is longer than the user's tier permits.