Text-to-Speech
Text-to-Speech
Text-to-Speech
OpenRouter supports text-to-speech (TTS) via a dedicated /api/v1/audio/speech endpoint that is compatible with the OpenAI Audio Speech API. Send text and receive a raw audio byte stream in your chosen format.
You can find TTS models in several ways:
Use the output_modalities query parameter on the Models API to discover TTS models:
Visit the Models page and filter by output modalities to find models capable of speech synthesis. Look for models that list "speech" in their output modalities.
Send a POST request to /api/v1/audio/speech with the text you want to synthesize. The response is a raw audio byte stream — not JSON — so you can pipe it directly to a file or audio player.
You can pass provider-specific options using the provider parameter. Options are keyed by provider slug, and only the options for the matched provider are forwarded:
The TTS endpoint returns a raw audio byte stream, not JSON. The response includes the following headers:
TTS models are priced per character of input text. Pricing varies by model and provider. You can check the per-character cost for each model on the Models page or via the Models API.
The TTS endpoint is fully compatible with the OpenAI SDK. You can use the OpenAI client libraries by pointing them at OpenRouter’s base URL:
mp3 for storage and general playback. Use pcm for real-time streaming pipelines where latency mattersspeed parameter is only supported by certain providers (e.g., OpenAI). It is silently ignored by providers that don’t support itEmpty or corrupted audio file?
response_format matches how you’re saving the file (e.g., don’t save pcm output with a .mp3 extension)Model not found?
openai/gpt-4o-mini-tts-2025-12-15, not gpt-4o-mini-tts)Voice not available?