Request flow
When you send a request tohttps://gateway.ngrok.ai:
- Your app sends a request with your access key to
https://gateway.ngrok.ai - The gateway validates the key and loads its access key configuration
- The gateway resolves the model from your request body
- Upstream credentials are selected from the configuration’s routing rules—ngrok.ai inference or your stored provider keys
- Unsupported parameters are stripped for the chosen model before forwarding
- The request is forwarded to the upstream provider
- On failure, the gateway retries with the next provider key or model candidate
- The response is returned to your app and usage is recorded
gateway.ngrok.ai.
Authentication
Every request must include a valid access key. Send it asAuthorization: Bearer ng-xxxxx-g1-xxxxx (OpenAI-compatible APIs) or x-api-key (Anthropic native API).
Provider keys stay in app.ngrok.ai. The AI Gateway injects those from your configuration after validating the access key.
Model selection
The gateway determines which model and provider to use from your request.| Model in request | What happens |
|---|---|
gpt-4o | Catalog lookup → OpenAI |
openai:gpt-4o | Explicit OpenAI routing |
my-provider:my-model | A model you run yourself |
models array | Fail over through listed models in order |
Access key configurations
Configurations define:- Access scope: allowed providers and models
- Routing rules: ngrok.ai inference vs provider keys per provider
Failover
The gateway retries automatically when upstream requests fail.What triggers failover?
- Rate limits (HTTP 429)
- Auth errors from exhausted or invalid provider keys
- Timeouts and server errors (HTTP 5xx)
Failover order
- Next provider key in the routing rule (ordered list)
- Next step in the routing rule (if configured)
- Next model in the request’s
modelsarray
Timeouts and token limits
Configure account-level limits in Account Settings.| Setting | Default | Description |
|---|---|---|
| Per-request timeout | 60s | Max time for a single upstream attempt |
| Total timeout | 120s | Max time including all failover attempts |
| Max input tokens | none | Reject requests exceeding estimated input tokens |
| Max output tokens | none | Cap completion length |
Next steps
Access Keys
Create keys to authenticate requests
Access Key Configurations
Scope and route traffic per key
Choose a model
Model IDs, provider prefixes, and custom provider models
Error Handling
How the gateway handles failures