Response envelope

Envelope shape

The extension fields sit at the top level of the response, alongside choices and usage. Existing OpenAI client code never sees them and never breaks because of them.

JSON

{
  "id": "chatcmpl-...",
  "object": "chat.completion",
  "choices": [...],
  "usage": {...},

  "billing": { "cost": 0.0162, "balanceAfter": 983.84, "isFallback": false, "latencyMs": 847 },
  "risk_classification": { "level": "limited", "reason": "general_business_use", "requires_human_review": false },
  "explainability": { "model_selection_reason": "...", "human_readable": "..." },
  "user_notice": { "disclosure": "...", "appeal_instructions": "...", "enforcement_token": "eyJhbGciOiJIUzI1NiJ9..." },
  "liability_attestation": "eyJhbGciOiJIUzI1NiJ9...",
  "truncation": { "original_tokens": 12000, "truncated_tokens": 8000, "strategy": "middle_out" },
  "free_quota_reminder": { "remaining": 7, "daily_limit": 200, "message": "You have 7 free requests remaining today (out of 200). Use a paid model to continue your experience." }
}

billing, risk_classification, explainability, and liability_attestation appear on every successful response. user_notice (and the enforcement_token nested inside it) appears when the request was classified as limited or high-risk under Article 50. truncation appears when auto_truncate: true was set and the prompt exceeded the model's context window. free_quota_reminder appears on the trailing 20 free responses (181→200/day) and on any post-exhaustion request, never otherwise. dry_run appears only on responses to X-Meridian-Dry-Run: true requests.

`billing`

Field	Type	Description
`cost`	number	Cost charged to the account for this request, in credits.
`balanceAfter`	number	Remaining credit balance after the debit.
`isFallback`	boolean	`true` when a non-primary entry in the `models` chain ultimately served the response.
`latencyMs`	number	End-to-end latency including all retried attempts.

`risk_classification`

Field	Type	Description
`level`	enum	`minimal`, `limited`, `high`, or `prohibited` (the last only appears in error envelopes).
`reason`	string	Short label for the dominant rule that produced the classification.
`triggered_rules`	string[]	All Article 5 / Annex III rules that fired (e.g. `article_5_1_a_manipulation`, `annex_iii_creditworthiness`).
`requires_human_review`	boolean	The response was queued in the human-review queue.
`auto_restricted`	boolean	The router auto-applied a stricter routing constraint (e.g. ZDR-only providers) for this request.
`llm_judge_confidence`	number 0–1	Confidence from the secondary LLM-as-a-judge that validates the rule-based classifier.
`confidence`	number 0–1	Combined confidence across rule + judge.

`explainability`

Article 13 — provides a plain-language explanation of the routing decision. Generated alongside every response.

Field	Type	Description
`model_selection_reason`	string	Why this provider/model was picked for this request.
`risk_reasoning`	string	Why the request landed in the chosen risk tier.
`human_readable`	string	One-paragraph plain-language explanation suitable for showing to an end user or auditor.

`user_notice`

Article 50 — appears when the request is classified as limited or high risk under the EU AI Act. Surface the disclosure string to the end user before they consume the response.

Field	Type	Description
`disclosure`	string	The text the end user must see (e.g. "This response was generated by AI.").
`appeal_instructions`	string	How the end user can contest a high-risk decision (per Article 86).
`enforcement_token`	string (JWT)	HS256-signed JWT issued only on limited / high-risk responses. First-party SDKs verify the signature and refuse to render the response until the disclosure has been displayed and the token's `jti` has been ack'd back to the audit chain. TTL 5 minutes — older tokens are rejected on ack. The signing key is rotatable via the `ENFORCEMENT_SIGNING_KEY` env var; the `aud` claim binds the token to the request id.

`liability_attestation`

Article 86 / shared responsibility — every successful response carries a JWT signed by Meridian Blue's control plane partitioning responsibility between provider, deployer, and Meridian. Verify it server-side when you persist the response so an audit trail can later prove who was responsible for what at the moment of generation.

The token is a top-level string (not nested in another object) so log scrapers can grep for it by name. Claims include iss, aud (the request id), jti, iat, exp, plus Meridian-namespaced claims describing the routing decision and the deployer's signed policy version that approved it.

`truncation`

Appears only when auto_truncate: true was set on the request and the prompt exceeded the chosen model's context window. The receipt is also surfaced as the X-Meridian-Truncated header.

Field	Type	Description
`original_tokens`	number	Token count before truncation.
`truncated_tokens`	number	Token count after truncation (≤ model context window).
`strategy`	string	Strategy applied — typically `middle_out` (drop oldest non-system messages first).

`free_quota_reminder`

Surfaces an in-band countdown when the user is in the trailing 20 of their daily free request quota (responses 181 through 200 with the spec's 200/day default). Helps client UIs nudge the user toward a paid plan before the hard wall hits. Absent on every other response.

Field	Type	Description
`remaining`	number	Free requests remaining after this response is counted. Hits `0` on the 200th of the day.
`daily_limit`	number	Daily free quota cap for the user's tier (200 by default).
`message`	string	Plain-text upsell ready to surface in your UI. Tail copy varies: users with credit balance see "Use a paid model to continue your experience"; users without credits see "Add funds to your account to use paid models once your daily free requests are exhausted." No emojis.

`dry_run`

Boolean (always true when present). Set on responses to requests that carried the X-Meridian-Dry-Run: true header — those return the routing decision, risk_classification, and explainability envelope but do not call the upstream provider, do not bill the caller, and emit an empty choices: []. Use this for CI checks ("would this prompt make it through the policy?") and dashboard explainability previews.

Client handling

The extension fields are additive — you don't have to consume any of them. Most teams pull two values into their own logs:

billing.cost for cross-checking against the dashboard usage export.
billing.isFallback to track per-tenant upstream provider reliability.

If you need the same routing info at the HTTP level (so a proxy or log scraper doesn't have to parse the body), every field above also appears as an X-Meridian-* response header — see Headers.

Cookie	Purpose	Duration
`_ga`	Distinguishes unique visitors	2 years
`_gid`	Distinguishes unique visitors	24 hours
`_ga_*`	Maintains session state	2 years

Envelope shape

billing

risk_classification

explainability

user_notice

liability_attestation

truncation

free_quota_reminder

dry_run