Load balancing
When a model name resolves to several upstream mappings (different keys, different regions of the same provider), Meridian Blue picks one by weighted random selection — and a per-mapping circuit breaker takes broken ones out of the pool until they recover.
Weighted random selection
Each model mapping carries a config.weight (default 1, minimum 0). When a request resolves to multiple healthy mappings for the same model name, the router picks one with probability proportional to its weight.
Distribution is verified to be within ±2 percentage points of the configured weights over 10k samples (the spec's tolerance). A weight of 0 evicts a mapping entirely while keeping the row in the catalogue — useful for graceful retirement.
Circuit breaker
Each mapping has a per-process failure counter. When consecutive 5xx / network failures exceed the threshold, the breaker opens and the mapping is removed from selection until the cool-down expires (default 10 seconds, the spec's explicit bound). After the cool-down the next request is a trial; success closes the breaker, failure re-opens it.
4xx errors are not counted against the breaker — a malformed client request shouldn't poison an upstream's reputation. Only 5xx + network errors flip the counter.
Health filtering
For every chain entry, the router runs circuitBreaker.filterHealthy(mappings) first. If at least one mapping is healthy, the unhealthy ones are excluded from selection. If every mapping is unhealthy, the router falls back to weighted selection across the full set so a brief blip can't permanently strand traffic.
Configuration
Weights are set per mapping at create time (admin UI on the dashboard) or via the management API:
POST /api/v1/models
{
"modelName": "gpt-4o",
"provider": "azure-openai-eastus",
"providerModel": "gpt-4o",
"config": {
"endpoint": "https://eastus.api.cognitive.microsoft.com/...",
"apiKey": "...",
"weight": 3
}
}
Three regions of the same model with weights 3 / 2 / 1 would receive roughly 50% / 33% / 17% of traffic.