Skip to main content

Request flow

When you send a request to https://gateway.ngrok.ai:
  1. Your app sends a request with your access key to https://gateway.ngrok.ai
  2. The gateway validates the key and loads its access key configuration
  3. The gateway resolves the model from your request body
  4. Upstream credentials are selected from the configuration’s routing rules—ngrok.ai inference or your stored provider keys
  5. Unsupported parameters are stripped for the chosen model before forwarding
  6. The request is forwarded to the upstream provider
  7. On failure, the gateway retries with the next provider key or model candidate
  8. The response is returned to your app and usage is recorded
Manage keys, configurations, providers, and credits at app.ngrok.ai. Your apps send traffic to gateway.ngrok.ai.

Authentication

Every request must include a valid access key. Send it as Authorization: Bearer ng-xxxxx-g1-xxxxx (OpenAI-compatible APIs) or x-api-key (Anthropic native API). Provider keys stay in app.ngrok.ai. The AI Gateway injects those from your configuration after validating the access key.

Model selection

The gateway determines which model and provider to use from your request.
Model in requestWhat happens
gpt-4oCatalog lookup → OpenAI
openai:gpt-4oExplicit OpenAI routing
my-provider:my-modelA model you run yourself
models arrayFail over through listed models in order
To choose model IDs and provider-qualified model names, see Choose a model.

Access key configurations

Configurations define:
  • Access scope: allowed providers and models
  • Routing rules: ngrok.ai inference vs provider keys per provider
An access key without a configuration uses allow-all scope. Built-in OpenAI and Anthropic default to ngrok.ai inference when you have credits.

Failover

The gateway retries automatically when upstream requests fail.

What triggers failover?

  • Rate limits (HTTP 429)
  • Auth errors from exhausted or invalid provider keys
  • Timeouts and server errors (HTTP 5xx)

Failover order

  1. Next provider key in the routing rule (ordered list)
  2. Next step in the routing rule (if configured)
  3. Next model in the request’s models array
To attach several provider keys and control the order they’re tried, see Multi-key failover.

Timeouts and token limits

Configure account-level limits in Account Settings.
SettingDefaultDescription
Per-request timeout60sMax time for a single upstream attempt
Total timeout120sMax time including all failover attempts
Max input tokensnoneReject requests exceeding estimated input tokens
Max output tokensnoneCap completion length

Next steps

Access Keys

Create keys to authenticate requests

Access Key Configurations

Scope and route traffic per key

Choose a model

Model IDs, provider prefixes, and custom provider models

Error Handling

How the gateway handles failures