API Reference

Chat completions

POST /v1/chat/completions is the primary inference endpoint. It accepts the standard OpenAI request shape plus a few Meridian-specific extras for fallback control, governance, and context-window management.

Request body

All standard OpenAI fields are supported and forwarded to the upstream provider unchanged: model, messages, temperature, top_p, max_tokens, stream, tools, tool_choice, response_format, seed, stop, logit_bias, logprobs, n, presence_penalty, frequency_penalty, user.

JSON
{
  "models": [
    "gpt-4o",
    "claude-sonnet-4-5-20241022"
  ],
  "messages": [
    { "role": "system", "content": "You are a customer support agent." },
    { "role": "user", "content": "Where is my order?" }
  ],
  "temperature": 0.2,
  "max_tokens": 512,

  // Meridian-specific (all optional)
  "auto_truncate": true,
  "purpose": "customer_support",
  "user_consent_id": "consent_abc123",
  "end_user_id": "customer_4471"
}

You may pass model (singular string) or models (ordered array). When both are present, models wins. models is the preferred form because it lets you declare a fallback chain in one request.

Meridian-specific fields

These fields are stripped before the request is forwarded to the upstream provider — they only steer Meridian Blue's internal logic.

FieldTypeDescription
modelsstring[]Ordered fallback chain. Index 0 is tried first; later entries on retryable failure. See Fallback chains.
auto_truncatebooleanDrop the oldest non-system messages when the prompt exceeds the model's context window. The receipt is returned in the truncation field of the response.
purposestringFree-form purpose label. Stored in the conversation log; high-risk-purpose patterns trigger Article 5 / Annex III detection.
user_consent_idstringReference to a stored GDPR consent record. Required when Article 50 / Annex III gating is enabled at the tenant level.
end_user_idstringPseudonymous end-user identifier. Used for data-lineage linking and right-to-erasure exports.
deployer_contextobjectFree-form context (org / team / use case) attached to the audit log entry.

Response shape

Meridian Blue returns the standard OpenAI response shape unchanged, with several extension fields added at the top level. Existing OpenAI client code reading choices[0].message.content works without changes.

JSON
{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "created": 1744119282,
  "model": "gpt-4o-2024-08-06",
  "choices": [{
    "index": 0,
    "message": { "role": "assistant", "content": "..." },
    "finish_reason": "stop"
  }],
  "usage": { "prompt_tokens": 42, "completion_tokens": 120, "total_tokens": 162 },

  // Meridian extensions (top-level, alongside choices/usage)
  "billing": {
    "cost": 0.0162,
    "balanceAfter": 983.84,
    "isFallback": false,
    "latencyMs": 847
  },
  "risk_classification": {
    "level": "limited",
    "reason": "general_business_use",
    "requires_human_review": false
  },
  "explainability": {
    "model_selection_reason": "Primary model returned 200 on first attempt.",
    "human_readable": "Routed to gpt-4o (limited risk). No fallback used."
  }
}

See Response envelope for the full reference, and Headers for the X-Meridian-* headers carrying the same routing info at the HTTP level.

Streaming

Set "stream": true for Server-Sent Events. Meridian Blue forwards SSE chunks as they arrive from the upstream provider, rewriting the model field on each chunk to your requested model name. The terminal data: [DONE] sentinel is preserved.

SSE
data: {"id":"chatcmpl-...","choices":[{"delta":{"role":"assistant","content":""}}]}

data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"Hello"}}]}

data: {"id":"chatcmpl-...","choices":[{"delta":{"content":"!"}}]}

data: {"id":"chatcmpl-...","choices":[{"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":2,"total_tokens":14}}

data: [DONE]

Token usage arrives in the final pre-[DONE] chunk when the upstream provider supports the OpenAI stream_options.include_usage extension. When it doesn't, Meridian Blue estimates completion tokens from the accumulated content (~4 chars per token).

Fallback during streaming only works before the first byte is written to the response — once the SSE response is committed, a mid-stream provider failure cannot be recovered.

Tool calls

Tool calls work exactly like the OpenAI surface — pass tools + tool_choice in the request and receive tool_calls in the response. Meridian Blue forwards the schema verbatim to providers that natively support tools (OpenAI, Anthropic, Mistral, Groq, Gemini).

High-risk requests

If the router classifies your request as high-risk under Annex III (creditworthiness, recruitment, medical diagnosis, biometric ID, etc.), three things happen:

  • The key must have allowHighRisk: true; otherwise the request is rejected with 403.
  • If the deployer policy requires it, purpose and user_consent_id must be present on the request body.
  • The response is held in the human-review queue depending on the policy; see Human oversight.

The full classification — including the triggered rules and the LLM-judge confidence — is returned in risk_classification on every response. See Risk classification.