Response envelope
Every successful Meridian Blue response carries the standard OpenAI shape plus a small set of extension fields at the top level. They tell you what it cost, what risk tier it was classified into, why the router picked the model it did, and (when applicable) what the user must be told.
Envelope shape
The extension fields sit at the top level of the response, alongside choices and usage. Existing OpenAI client code never sees them and never breaks because of them.
{
"id": "chatcmpl-...",
"object": "chat.completion",
"choices": [...],
"usage": {...},
"billing": { "cost": 0.0162, "balanceAfter": 983.84, "isFallback": false, "latencyMs": 847 },
"risk_classification": { "level": "limited", "reason": "general_business_use", "requires_human_review": false },
"explainability": { "model_selection_reason": "...", "human_readable": "..." },
"user_notice": { "disclosure": "...", "appeal_instructions": "...", "enforcement_token": "eyJhbGciOiJIUzI1NiJ9..." },
"liability_attestation": "eyJhbGciOiJIUzI1NiJ9...",
"truncation": { "original_tokens": 12000, "truncated_tokens": 8000, "strategy": "middle_out" },
"free_quota_reminder": { "remaining": 7, "daily_limit": 200, "message": "You have 7 free requests remaining today (out of 200). Use a paid model to continue your experience." }
}
billing, risk_classification, explainability, and liability_attestation appear on every successful response. user_notice (and the enforcement_token nested inside it) appears when the request was classified as limited or high-risk under Article 50. truncation appears when auto_truncate: true was set and the prompt exceeded the model's context window. free_quota_reminder appears on the trailing 20 free responses (181→200/day) and on any post-exhaustion request, never otherwise. dry_run appears only on responses to X-Meridian-Dry-Run: true requests.
billing
| Field | Type | Description |
|---|---|---|
cost | number | Cost charged to the account for this request, in credits. |
balanceAfter | number | Remaining credit balance after the debit. |
isFallback | boolean | true when a non-primary entry in the models chain ultimately served the response. |
latencyMs | number | End-to-end latency including all retried attempts. |
risk_classification
| Field | Type | Description |
|---|---|---|
level | enum | minimal, limited, high, or prohibited (the last only appears in error envelopes). |
reason | string | Short label for the dominant rule that produced the classification. |
triggered_rules | string[] | All Article 5 / Annex III rules that fired (e.g. article_5_1_a_manipulation, annex_iii_creditworthiness). |
requires_human_review | boolean | The response was queued in the human-review queue. |
auto_restricted | boolean | The router auto-applied a stricter routing constraint (e.g. ZDR-only providers) for this request. |
llm_judge_confidence | number 0–1 | Confidence from the secondary LLM-as-a-judge that validates the rule-based classifier. |
confidence | number 0–1 | Combined confidence across rule + judge. |
explainability
Article 13 — provides a plain-language explanation of the routing decision. Generated alongside every response.
| Field | Type | Description |
|---|---|---|
model_selection_reason | string | Why this provider/model was picked for this request. |
risk_reasoning | string | Why the request landed in the chosen risk tier. |
human_readable | string | One-paragraph plain-language explanation suitable for showing to an end user or auditor. |
user_notice
Article 50 — appears when the request is classified as limited or high risk under the EU AI Act. Surface the disclosure string to the end user before they consume the response.
| Field | Type | Description |
|---|---|---|
disclosure | string | The text the end user must see (e.g. "This response was generated by AI."). |
appeal_instructions | string | How the end user can contest a high-risk decision (per Article 86). |
enforcement_token | string (JWT) | HS256-signed JWT issued only on limited / high-risk responses. First-party SDKs verify the signature and refuse to render the response until the disclosure has been displayed and the token's jti has been ack'd back to the audit chain. TTL 5 minutes — older tokens are rejected on ack. The signing key is rotatable via the ENFORCEMENT_SIGNING_KEY env var; the aud claim binds the token to the request id. |
liability_attestation
Article 86 / shared responsibility — every successful response carries a JWT signed by Meridian Blue's control plane partitioning responsibility between provider, deployer, and Meridian. Verify it server-side when you persist the response so an audit trail can later prove who was responsible for what at the moment of generation.
The token is a top-level string (not nested in another object) so log scrapers can grep for it by name. Claims include iss, aud (the request id), jti, iat, exp, plus Meridian-namespaced claims describing the routing decision and the deployer's signed policy version that approved it.
truncation
Appears only when auto_truncate: true was set on the request and the prompt exceeded the chosen model's context window. The receipt is also surfaced as the X-Meridian-Truncated header.
| Field | Type | Description |
|---|---|---|
original_tokens | number | Token count before truncation. |
truncated_tokens | number | Token count after truncation (≤ model context window). |
strategy | string | Strategy applied — typically middle_out (drop oldest non-system messages first). |
free_quota_reminder
Surfaces an in-band countdown when the user is in the trailing 20 of their daily free request quota (responses 181 through 200 with the spec's 200/day default). Helps client UIs nudge the user toward a paid plan before the hard wall hits. Absent on every other response.
| Field | Type | Description |
|---|---|---|
remaining | number | Free requests remaining after this response is counted. Hits 0 on the 200th of the day. |
daily_limit | number | Daily free quota cap for the user's tier (200 by default). |
message | string | Plain-text upsell ready to surface in your UI. Tail copy varies: users with credit balance see "Use a paid model to continue your experience"; users without credits see "Add funds to your account to use paid models once your daily free requests are exhausted." No emojis. |
dry_run
Boolean (always true when present). Set on responses to requests that carried the X-Meridian-Dry-Run: true header — those return the routing decision, risk_classification, and explainability envelope but do not call the upstream provider, do not bill the caller, and emit an empty choices: []. Use this for CI checks ("would this prompt make it through the policy?") and dashboard explainability previews.
Client handling
The extension fields are additive — you don't have to consume any of them. Most teams pull two values into their own logs:
billing.costfor cross-checking against the dashboard usage export.billing.isFallbackto track per-tenant upstream provider reliability.
If you need the same routing info at the HTTP level (so a proxy or log scraper doesn't have to parse the body), every field above also appears as an X-Meridian-* response header — see Headers.