API Reference

Response envelope

Every successful Meridian Blue response carries the standard OpenAI shape plus a small set of extension fields at the top level. They tell you what it cost, what risk tier it was classified into, why the router picked the model it did, and (when applicable) what the user must be told.

Envelope shape

The extension fields sit at the top level of the response, alongside choices and usage. Existing OpenAI client code never sees them and never breaks because of them.

JSON
{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "choices": [...],
  "usage": {...},

  "billing": { "cost": 0.0162, "balanceAfter": 983.84, "isFallback": false, "latencyMs": 847 },
  "risk_classification": { "level": "limited", "reason": "general_business_use", "requires_human_review": false },
  "explainability": { "model_selection_reason": "...", "human_readable": "..." },
  "user_notice": { "disclosure": "...", "appeal_instructions": "...", "enforcement_token": "eyJhbGciOiJIUzI1NiJ9..." },
  "liability_attestation": "eyJhbGciOiJIUzI1NiJ9...",
  "truncation": { "original_tokens": 12000, "truncated_tokens": 8000, "strategy": "middle_out" },
  "free_quota_reminder": { "remaining": 7, "daily_limit": 200, "message": "You have 7 free requests remaining today (out of 200). Use a paid model to continue your experience." }
}

billing, risk_classification, explainability, and liability_attestation appear on every successful response. user_notice (and the enforcement_token nested inside it) appears when the request was classified as limited or high-risk under Article 50. truncation appears when auto_truncate: true was set and the prompt exceeded the model's context window. free_quota_reminder appears on the trailing 20 free responses (181→200/day) and on any post-exhaustion request, never otherwise. dry_run appears only on responses to X-Meridian-Dry-Run: true requests.

billing

FieldTypeDescription
costnumberCost charged to the account for this request, in credits.
balanceAfternumberRemaining credit balance after the debit.
isFallbackbooleantrue when a non-primary entry in the models chain ultimately served the response.
latencyMsnumberEnd-to-end latency including all retried attempts.

risk_classification

FieldTypeDescription
levelenumminimal, limited, high, or prohibited (the last only appears in error envelopes).
reasonstringShort label for the dominant rule that produced the classification.
triggered_rulesstring[]All Article 5 / Annex III rules that fired (e.g. article_5_1_a_manipulation, annex_iii_creditworthiness).
requires_human_reviewbooleanThe response was queued in the human-review queue.
auto_restrictedbooleanThe router auto-applied a stricter routing constraint (e.g. ZDR-only providers) for this request.
llm_judge_confidencenumber 0–1Confidence from the secondary LLM-as-a-judge that validates the rule-based classifier.
confidencenumber 0–1Combined confidence across rule + judge.

explainability

Article 13 — provides a plain-language explanation of the routing decision. Generated alongside every response.

FieldTypeDescription
model_selection_reasonstringWhy this provider/model was picked for this request.
risk_reasoningstringWhy the request landed in the chosen risk tier.
human_readablestringOne-paragraph plain-language explanation suitable for showing to an end user or auditor.

user_notice

Article 50 — appears when the request is classified as limited or high risk under the EU AI Act. Surface the disclosure string to the end user before they consume the response.

FieldTypeDescription
disclosurestringThe text the end user must see (e.g. "This response was generated by AI.").
appeal_instructionsstringHow the end user can contest a high-risk decision (per Article 86).
enforcement_tokenstring (JWT)HS256-signed JWT issued only on limited / high-risk responses. First-party SDKs verify the signature and refuse to render the response until the disclosure has been displayed and the token's jti has been ack'd back to the audit chain. TTL 5 minutes — older tokens are rejected on ack. The signing key is rotatable via the ENFORCEMENT_SIGNING_KEY env var; the aud claim binds the token to the request id.

liability_attestation

Article 86 / shared responsibility — every successful response carries a JWT signed by Meridian Blue's control plane partitioning responsibility between provider, deployer, and Meridian. Verify it server-side when you persist the response so an audit trail can later prove who was responsible for what at the moment of generation.

The token is a top-level string (not nested in another object) so log scrapers can grep for it by name. Claims include iss, aud (the request id), jti, iat, exp, plus Meridian-namespaced claims describing the routing decision and the deployer's signed policy version that approved it.

truncation

Appears only when auto_truncate: true was set on the request and the prompt exceeded the chosen model's context window. The receipt is also surfaced as the X-Meridian-Truncated header.

FieldTypeDescription
original_tokensnumberToken count before truncation.
truncated_tokensnumberToken count after truncation (≤ model context window).
strategystringStrategy applied — typically middle_out (drop oldest non-system messages first).

free_quota_reminder

Surfaces an in-band countdown when the user is in the trailing 20 of their daily free request quota (responses 181 through 200 with the spec's 200/day default). Helps client UIs nudge the user toward a paid plan before the hard wall hits. Absent on every other response.

FieldTypeDescription
remainingnumberFree requests remaining after this response is counted. Hits 0 on the 200th of the day.
daily_limitnumberDaily free quota cap for the user's tier (200 by default).
messagestringPlain-text upsell ready to surface in your UI. Tail copy varies: users with credit balance see "Use a paid model to continue your experience"; users without credits see "Add funds to your account to use paid models once your daily free requests are exhausted." No emojis.

dry_run

Boolean (always true when present). Set on responses to requests that carried the X-Meridian-Dry-Run: true header — those return the routing decision, risk_classification, and explainability envelope but do not call the upstream provider, do not bill the caller, and emit an empty choices: []. Use this for CI checks ("would this prompt make it through the policy?") and dashboard explainability previews.

Client handling

The extension fields are additive — you don't have to consume any of them. Most teams pull two values into their own logs:

  • billing.cost for cross-checking against the dashboard usage export.
  • billing.isFallback to track per-tenant upstream provider reliability.

If you need the same routing info at the HTTP level (so a proxy or log scraper doesn't have to parse the body), every field above also appears as an X-Meridian-* response header — see Headers.