If you’d like to be a model provider and sell inference on OpenRouter, fill out our form to get started.
To be eligible to provide inference on OpenRouter you must have the following:
You must implement an endpoint that returns all models that should be served by OpenRouter. At this endpoint, please return a list of all available models on your platform. Below is an example of the response format:
The id field should be the exact model identifier that OpenRouter will use when calling your API.
The pricing fields are in string format to avoid floating point precision issues, and must be in USD.
Valid quantization values are: int4, int8, fp4, fp6, fp8, fp16, bf16, fp32.
Valid sampling parameters are: temperature, top_p, top_k, min_p, top_a, frequency_penalty, presence_penalty, repetition_penalty, stop, seed, max_tokens, logit_bias.
Valid features are: tools, json_mode, structured_outputs, logprobs, web_search, reasoning.
For models with different pricing based on context length (e.g., long context pricing), you can provide pricing as an array of tiers instead of a single object:
When using tiered pricing, the first tier (index 0) is the base pricing that applies when input tokens are below the min_context threshold. The second tier applies when input tokens meet or exceed the min_context value.
Limitations:
image and request fields are only supported in the base tier (index 0) and will be ignored if included in other tiers.If a model is scheduled for deprecation, include the deprecation_date field in ISO 8601 format (YYYY-MM-DD):
When OpenRouter’s provider monitor detects a deprecation date, it will automatically update the endpoint to display deprecation warnings to users. Models past their deprecation date may be automatically hidden from the marketplace.
is_readyBy default, when OpenRouter’s provider monitor sees a new model in your /v1/models response, it auto-stages the endpoint, runs baseline tests, and unhides it (makes it live) once the tests pass and pricing is configured. If you need to upload a model ahead of an announcement — or temporarily take a model offline — set the optional boolean is_ready field:
Behavior:
is_ready: false keeps newly-staged endpoints hidden even if all baseline tests pass, and auto-hides any matching endpoint that is currently live. Use this to upload a model in advance of launch, or to take a live model offline coordinated with us.is_ready: true and an omitted/absent field both preserve the default auto-stage and auto-unhide behavior.For OpenRouter to use the provider we must be able to pay for inference automatically. This can be done via auto top up or invoicing.
OpenRouter automatically monitors provider reliability and adjusts traffic routing based on uptime metrics. Your endpoint’s uptime is calculated as: successful requests ÷ total requests (excluding user errors).
Errors that affect your uptime:
Errors that DON’T affect uptime:
Traffic routing thresholds:
This system ensures traffic automatically flows to the most reliable providers while giving temporary issues time to resolve.
OpenRouter publicly tracks TTFT (time to first token) and throughput (tokens/second) for all providers on each model page.
Throughput is calculated as: output tokens ÷ generation time, where generation time includes fetch latency (time from request to first server response), TTFT, and streaming time. This means any queueing on your end will show up in your throughput metrics.
To keep your metrics competitive:
Auto Exacto is a routing step that automatically reorders providers for all requests that include tools. It runs by default on every tool-calling request and may change how much tool-calling traffic your endpoints receive.
Auto Exacto shifts tool-calling traffic toward providers that perform well on tool-use quality signals. Providers with strong metrics are moved to the front of the routing order and will receive more tool-calling requests, while providers with weaker signals are deprioritized and will see less.
Non-tool-calling traffic is not affected by Auto Exacto — it continues to follow the standard price-weighted routing.
Auto Exacto uses three classes of signals, all derived from real traffic and evaluations on your endpoints:
These are the same metrics available in your provider dashboard. Once onboarded, our team can give you access to it.
For each model, we compare every provider’s signal values against the group of providers serving that model. We use a median + MAD (median absolute deviation) approach rather than simple averages, which keeps thresholds stable even when one provider is a significant outlier.
Each signal has a different sensitivity:
A minimum of 4 providers serving the same model is required before statistical thresholds are computed. Below that count, no deprioritization is applied for that signal.
Endpoints are placed into one of three tiers:
Consistent rate limiting (429s) can reduce the volume of successful requests available for evaluation, making it harder for us to collect enough benchmark data to place your endpoint in the top tier. Returning early 429s is still preferred over queueing, but minimizing rate limits where possible helps ensure your endpoint has sufficient data for a fair evaluation.
To maximize the tool-calling traffic routed to your endpoints:
For the full user-facing documentation on Auto Exacto, see Auto Exacto.