Chat completions
POST /v1/chat/completions is the primary inference endpoint. It accepts the standard OpenAI request shape plus a few Meridian-specific extras for fallback control, governance, and context-window management.
Request body
All standard OpenAI fields are supported and forwarded to the upstream provider unchanged: model, messages, temperature, top_p, max_tokens, stream, tools, tool_choice, response_format, seed, stop, logit_bias, logprobs, n, presence_penalty, frequency_penalty, user.
{
"models": [
"gpt-4o",
"claude-sonnet-4-5-20241022"
],
"messages": [
{ "role": "system", "content": "You are a customer support agent." },
{ "role": "user", "content": "Where is my order?" }
],
"temperature": 0.2,
"max_tokens": 512,
// Meridian-specific (all optional)
"auto_truncate": true,
"purpose": "customer_support",
"user_consent_id": "consent_abc123",
"end_user_id": "customer_4471"
}
You may pass model (singular string) or models (ordered array). When both are present, models wins. models is the preferred form because it lets you declare a fallback chain in one request.
Meridian-specific fields
These fields are stripped before the request is forwarded to the upstream provider — they only steer Meridian Blue's internal logic.
| Field | Type | Description |
|---|---|---|
models | string[] | Ordered fallback chain. Index 0 is tried first; later entries on retryable failure. See Fallback chains. |
auto_truncate | boolean | Drop the oldest non-system messages when the prompt exceeds the model's context window. The receipt is returned in the truncation field of the response. |
purpose | string | Free-form purpose label. Stored in the conversation log; high-risk-purpose patterns trigger Article 5 / Annex III detection. |
user_consent_id | string | Reference to a stored GDPR consent record. Required when Article 50 / Annex III gating is enabled at the tenant level. |
end_user_id | string | Pseudonymous end-user identifier. Used for data-lineage linking and right-to-erasure exports. |
deployer_context | object | Free-form context (org / team / use case) attached to the audit log entry. |
Response shape
Meridian Blue returns the standard OpenAI response shape unchanged, with several extension fields added at the top level. Existing OpenAI client code reading choices[0].message.content works without changes.
{
"id": "chatcmpl-...",
"object": "chat.completion",
"created": 1744119282,
"model": "gpt-4o-2024-08-06",
"choices": [{
"index": 0,
"message": { "role": "assistant", "content": "..." },
"finish_reason": "stop"
}],
"usage": { "prompt_tokens": 42, "completion_tokens": 120, "total_tokens": 162 },
// Meridian extensions (top-level, alongside choices/usage)
"billing": {
"cost": 0.0162,
"balanceAfter": 983.84,
"isFallback": false,
"latencyMs": 847
},
"risk_classification": {
"level": "limited",
"reason": "general_business_use",
"requires_human_review": false
},
"explainability": {
"model_selection_reason": "Primary model returned 200 on first attempt.",
"human_readable": "Routed to gpt-4o (limited risk). No fallback used."
}
}
See Response envelope for the full reference, and Headers for the X-Meridian-* headers carrying the same routing info at the HTTP level.
Streaming
Set "stream": true for Server-Sent Events. Meridian Blue forwards SSE chunks as they arrive from the upstream provider, rewriting the model field on each chunk to your requested model name. The terminal data: [DONE] sentinel is preserved.
data: {"id":"chatcmpl-...","choices":[{"delta":{"role":"assistant","content":""}}]}
data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Hello"}}]}
data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"!"}}]}
data: {"id":"chatcmpl-...","choices":[{"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":2,"total_tokens":14}}
data: [DONE]
Token usage arrives in the final pre-[DONE] chunk when the upstream provider supports the OpenAI stream_options.include_usage extension. When it doesn't, Meridian Blue estimates completion tokens from the accumulated content (~4 chars per token).
Fallback during streaming only works before the first byte is written to the response — once the SSE response is committed, a mid-stream provider failure cannot be recovered.
Tool calls
Tool calls work exactly like the OpenAI surface — pass tools + tool_choice in the request and receive tool_calls in the response. Meridian Blue forwards the schema verbatim to providers that natively support tools (OpenAI, Anthropic, Mistral, Groq, Gemini).
High-risk requests
If the router classifies your request as high-risk under Annex III (creditworthiness, recruitment, medical diagnosis, biometric ID, etc.), three things happen:
- The key must have
allowHighRisk: true; otherwise the request is rejected with403. - If the deployer policy requires it,
purposeanduser_consent_idmust be present on the request body. - The response is held in the human-review queue depending on the policy; see Human oversight.
The full classification — including the triggered rules and the LLM-judge confidence — is returned in risk_classification on every response. See Risk classification.