Routing & Resilience

Load balancing

When a model name resolves to several upstream mappings (different keys, different regions of the same provider), Meridian Blue picks one by weighted random selection — and a per-mapping circuit breaker takes broken ones out of the pool until they recover.

Weighted random selection

Each model mapping carries a config.weight (default 1, minimum 0). When a request resolves to multiple healthy mappings for the same model name, the router picks one with probability proportional to its weight.

Distribution is verified to be within ±2 percentage points of the configured weights over 10k samples (the spec's tolerance). A weight of 0 evicts a mapping entirely while keeping the row in the catalogue — useful for graceful retirement.

Circuit breaker

Each mapping has a per-process failure counter. When consecutive 5xx / network failures exceed the threshold, the breaker opens and the mapping is removed from selection until the cool-down expires (default 10 seconds, the spec's explicit bound). After the cool-down the next request is a trial; success closes the breaker, failure re-opens it.

4xx errors are not counted against the breaker — a malformed client request shouldn't poison an upstream's reputation. Only 5xx + network errors flip the counter.

Health filtering

For every chain entry, the router runs circuitBreaker.filterHealthy(mappings) first. If at least one mapping is healthy, the unhealthy ones are excluded from selection. If every mapping is unhealthy, the router falls back to weighted selection across the full set so a brief blip can't permanently strand traffic.

Configuration

Weights are set per mapping at create time (admin UI on the dashboard) or via the management API:

JSON
POST /api/v1/models
{
  "modelName": "gpt-4o",
  "provider": "azure-openai-eastus",
  "providerModel": "gpt-4o",
  "config": {
    "endpoint": "https://eastus.api.cognitive.microsoft.com/...",
    "apiKey": "...",
    "weight": 3
  }
}

Three regions of the same model with weights 3 / 2 / 1 would receive roughly 50% / 33% / 17% of traffic.

Single-instance breaker. The circuit breaker is in-process. Multi-pod deployments share weights but not the failure counter — a Redis-backed counter is on the roadmap so the breaker observation is consistent across replicas.