Multimodal
Meridian Blue accepts the OpenAI multimodal content shape on every request and translates it to whatever the upstream provider expects (Anthropic image blocks, Gemini inlineData / fileData, etc.).
Vision input
Pass image URLs or base64 data URIs inside messages[].content using the OpenAI multimodal shape. Meridian Blue translates the part to Anthropic's image source format or Gemini's inlineData / fileData automatically when routing to those providers.
{
"model": "claude-sonnet-4-5-20241022",
"messages": [{
"role": "user",
"content": [
{ "type": "text", "text": "What's in this image?" },
{
"type": "image_url",
"image_url": { "url": "https://example.com/photo.jpg" }
}
]
}]
}
Both URL and base64 (data:image/png;base64,...) sources are supported. The MIME type is detected from the URL extension or the data URI prefix.
Image generation
POST /v1/images/generations proxies image-generation requests with the standard OpenAI body shape. Billing matches the upstream provider's reported per-image cost (the model mapping's pricing.unit is set to per_image).
curl https://api.meridianblue.ai/v1/images/generations \
-H "Authorization: Bearer $MERIDIAN_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-image-1",
"prompt": "A cartographer drafting a map of Europe",
"n": 1,
"size": "1024x1024"
}'
POST /v1/images/edits covers the editing surface and supports the same set of providers that have a native edit endpoint.
Speech-to-text
POST /v1/audio/transcriptions accepts an audio file (multipart/form-data) and returns a transcript. Use any Whisper-compatible model in the catalogue.
from openai import OpenAI
client = OpenAI(base_url="https://api.meridianblue.ai/v1", api_key="...")
with open("meeting.mp3", "rb") as f:
transcript = client.audio.transcriptions.create(
model="whisper-1",
file=f,
)
print(transcript.text)
Audio inputs as part of a chat completion (input_audio content parts) are also supported when the routed model has the AUDIO capability.
Text-to-speech
POST /v1/audio/speech generates synthetic voice. Use any TTS model in the catalogue (the output capability of the chosen model must include AUDIO).
Capability mismatch
Each model declares inputCapabilities (what it accepts) and outputCapabilities (what it returns) — TEXT, IMAGE, AUDIO, FILES, VIDEO. If you send an image to a text-only model, Meridian Blue strips the unsupported parts and replaces them with a placeholder text note ([Image attached — not supported by this model]) so the upstream call still succeeds.
If every message becomes text after stripping, the content is collapsed back to a plain string for maximum compatibility with text-only providers.
Native SDK passthrough
For multimodal payloads with vendor-specific shapes (Anthropic's image source blocks, Gemini's inlineData), use the native passthrough endpoints — your SDK's request shape is forwarded unchanged.
- Anthropic —
POST /v1/messageswith the standard Messages API shape. - Gemini —
POST /v1beta/models/{model}:generateContentwith the standard Gen AI shape.
Body limit on every proxy and passthrough route is 50 MB, so payloads with high-resolution images or long audio fit comfortably.