Caching
Prompt-cache and semantic-cache layers are on the roadmap. Today, Meridian Blue is a pure pass-through proxy — your request is forwarded to the upstream provider on every call.
What works today
Meridian Blue does not maintain a prompt or semantic cache yet. Every chat-completion request is forwarded to the upstream provider; nothing is served from a Meridian Blue-side cache.
Provider-side caching
Several upstream providers offer their own prompt-caching extensions (Anthropic cache_control, OpenAI prompt_caching, Gemini context cache). Those features are passthrough — Meridian Blue forwards your request body unchanged, so anything the provider supports natively keeps working.
Token-count and cost reporting honour the upstream provider's usage values, so cached tokens are billed at the provider's reduced rate (when the provider exposes it) without any extra wiring on the Meridian Blue side.
Gateway-level caching (roadmap)
- Prompt cache — Cache identical prompts (same model, same parameters) for short TTLs.
- Semantic cache — Embedding-based near-match cache.
- Per-tenant isolation — Cache keys salted with tenantId; consent and retention windows respected.
Caching at the gateway level requires deliberate per-tenant policy (retention, consent, residency) — designing that without breaking GDPR is the open work item. Until then, gateway-level caching is opt-in disabled.