Skip to main content

Getting started

What URL do I send requests to?

Send model requests to https://gateway.ngrok.ai. If you are using an OpenAI-compatible SDK, set the base URL to:
https://gateway.ngrok.ai/v1
For example:
import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://gateway.ngrok.ai/v1",
  apiKey: "ng-xxxxx-g1-xxxxx"
});
See the Quickstart for a full example.

Can I use my existing OpenAI SDK?

Yes. The AI Gateway supports OpenAI-compatible requests, so you can usually keep your SDK and change the base URL and API key. Use your ngrok.ai access key as the API key, and set the base URL to https://gateway.ngrok.ai/v1. See SDK Integration for examples.

Do I need a provider account to use ngrok.ai?

Not always. For certain providers, like OpenAI and Anthropic, you can skip adding your own provider key and use ngrok.ai inference on your credits instead. For other built-in providers, or if you want usage billed to your own provider account, you can add a provider key.

Do I need credits?

Yes. Every request through ngrok.ai require credits for the gateway processing fee. If ngrok.ai supplies the upstream provider credentials, credits also pay the upstream model cost. If you bring your own provider key, the provider bills you for model usage and credits only pay the gateway processing fee. See Credits.

Do I need a ngrok subscription plan?

No, not to use the AI Gateway itself. You can sign in at app.ngrok.ai, purchase AI Gateway credits, create an access key, and send requests to https://gateway.ngrok.ai. You may need a ngrok platform plan for platform features on the same account, such as team members or internal endpoints for custom providers. Manage those features from dashboard.ngrok.com. See Credits.

Keys and credentials

Which key do I put in my app?

Use an access key. Your app sends the access key to https://gateway.ngrok.ai. Do not put provider keys, ngrok API keys, or AI Gateway API keys in your app’s model requests. See Access keys.

What is the difference between access keys, provider keys, AI Gateway API keys, and ngrok API keys?

KeyUsed for
Access keyAuthenticates model requests to gateway.ngrok.ai
Provider keyAuthenticates ngrok.ai to an upstream provider
AI Gateway API keyAutomates AI Gateway resources through api.ngrok.ai
ngrok API keyManages ngrok platform resources through api.ngrok.com
Most applications only need an access key in runtime code.

What are access keys?

Access keys are what your app uses to authenticate model requests to https://gateway.ngrok.ai. It’s a good idea to create separate access keys for different applications, environments, or teams so you can easily track usage, control access, and revoke keys if needed.

Can I create multiple access keys?

Yes, you can. It’s a good idea to create separate access keys for each app, environment, client, or team. Doing this makes it easier to track usage, revoke access when needed, and apply different access key configurations.

What happens if I lose an access key?

Access key tokens are only shown once when they are created. If you lose one, delete the old access key, create a new one, and update your application with the new token.

Are access keys scoped to endpoints?

No. Access keys are account-scoped and work with https://gateway.ngrok.ai. You do not need to create or manage a custom endpoint URL to use the AI Gateway.

Can I use different keys for different teams?

Yes. Create separate access keys for each team or client. To control what each key can call, assign different access key configurations. Configurations can limit providers, models, and routing credentials.

Do I need provider keys?

It depends on the provider and how you want billing to work. For supported OpenAI and Anthropic models, you can use ngrok.ai credits without adding your own provider key. For other built-in providers, or if you want upstream usage billed to your own provider account, add a provider key.

Are provider keys exposed to my app?

No. Your app sends an access key to the AI Gateway. Provider keys are stored in app.ngrok.ai and used server-side when ngrok.ai calls the upstream provider. They do not need to be stored in your application code.

Does ngrok.ai provide provider API keys?

No. ngrok.ai does not give you provider API keys to copy or use outside the gateway. For supported OpenAI and Anthropic models, ngrok.ai can supply upstream credentials when you use credits. For other providers, or when you want to use your own provider account, add your own provider key.

How do I rotate keys?

For access keys, create a new access key, update your application, then delete the old access key. For provider keys, rotate the stored provider key value so configurations that reference the key can keep using the same key ID. See Bring your own provider key.

Credits and billing

What do credits pay for?

Credits always pay the AI Gateway processing fee. For supported OpenAI and Anthropic requests that use ngrok.ai inference, credits also pay the upstream model cost. If you bring your own provider key, the provider bills you for model usage and credits only pay the gateway processing fee.

How much is the processing fee?

The ngrok.ai processing fee is $0.05 per million tokens.

Do credits expire?

Yes. Credits expire 365 days after purchase.

What happens when credits run out?

The gateway rejects new requests until you purchase more credits from the Credits page.

What’s the minimum credit purchase?

The minimum credit purchase is $5.00. You can purchase credits from the Credits page.

Are AI Gateway credits the same as ngrok platform billing?

No. AI Gateway credits are purchased and used in app.ngrok.ai. ngrok platform billing for products managed through dashboard.ngrok.com is separate, and certain features such as team members and internal endpoints for self-hosted models require a ngrok subscription.

Models and providers

Which providers are supported?

The AI Gateway includes built-in support for:
  • OpenAI
  • Anthropic
  • Google
  • Groq
  • DeepSeek
  • OpenRouter
  • Hyperbolic
  • InceptionLabs
  • Inference.net
You can also connect self-hosted or private OpenAI-compatible and Anthropic-compatible endpoints as custom providers. See the Model Catalog for the full list.

Which providers can I call without adding a provider key?

You can see which providers require a provider key in the Model Catalog by checking the “needs provider keys” column. Other built-in providers require you to add your own provider key before sending traffic.

How do I choose a model?

Set the model field in your request. Use a model ID when you want ngrok.ai to resolve the provider:
{
  "model": "gpt-4o",
  "messages": [{"role": "user", "content": "Hello"}]
}
Use provider:model when you want to choose the provider explicitly:
{
  "model": "openai:gpt-4o",
  "messages": [{"role": "user", "content": "Hello"}]
}
See Choose a model.

How do I know which models are available?

Built-in providers and models are listed in the Model Catalog. The catalog includes provider IDs, model IDs, aliases, context windows, output token limits, modalities, and provider key requirements. Custom providers declare their own models when you create them.

Can clients use models not in the catalog?

Yes, if the request uses a provider-qualified model name and the provider allows pass-through for that model. For example:
{
  "model": "openai:gpt-5-preview",
  "messages": [{"role": "user", "content": "Hello"}]
}
To restrict an access key to approved models only, configure a model allowlist in the access key configuration.

Can I use self-hosted models?

Yes. Create a custom provider for a model running on your machine, private network, or cloud GPU. See Use a model you run yourself.

What is a custom provider?

A custom provider is an upstream endpoint you define, such as Ollama, vLLM, LM Studio, a private deployment, or another OpenAI-compatible or Anthropic-compatible API. Custom providers let you call models you run yourself through the same gateway URL as built-in providers. See Custom Providers.

Why isn’t my custom model working?

Check that:
  1. The upstream exposes a supported OpenAI or Anthropic API surface.
  2. The custom provider has the correct base URL.
  3. The custom provider declares the model ID you are calling.
  4. The endpoint is reachable from ngrok.ai.
  5. The access key configuration allows the provider and model.
  6. Provider credentials are attached if the upstream requires authentication.
See Use a model you run yourself.

Failover and timeouts

How does failover work?

The gateway tries candidates in order. It can fail over between provider keys for the same provider, then between fallback models listed in the request. The gateway does not retry the same model and provider key combination. It moves to the next available candidate. See Error Handling.

How do I configure fallback models?

Use model for the primary model and models for fallback models.
{
  "model": "openai:gpt-4o",
  "models": ["anthropic:claude-sonnet-4-6"],
  "messages": [{"role": "user", "content": "Hello"}]
}
The gateway tries the primary model first, then each fallback model in order until one succeeds. See Configure fallback models.

Can the gateway fail over to a different provider?

Yes, if the request includes fallback models from different providers. For example, you can try OpenAI first and fall back to Anthropic:
{
  "model": "openai:gpt-4o",
  "models": ["anthropic:claude-sonnet-4-6"],
  "messages": [{"role": "user", "content": "Hello"}]
}
Each provider and model must be allowed by the access key configuration.

How long does the gateway try before giving up?

You can configure the Total timeout in Account Settings to cap the time spent across all failover attempts for a request. If the total timeout is reached, the gateway stops trying candidates and returns an error.

Can I disable automatic failover?

Not directly. To narrow failover behavior, use a single provider key per provider, send requests with a single model, avoid fallback models, and set shorter timeouts in Account Settings.

Why am I getting rate limited even with multiple keys?

Check that multiple provider keys are attached to the relevant provider in the access key configuration. The gateway only fails over between keys that are available to the request. See Key selection and failover.

Security and privacy

Does ngrok.ai store request and response bodies?

The AI Gateway does not retain full request and response bodies by default. Request and response bodies are processed in memory. The gateway records request metadata for usage events and debugging, such as token counts, latencies, status codes, providers, and models. See Observability for more details.

Can I see individual request and response bodies?

No. Usage records include metadata such as token counts and latency, not full request or response bodies. See Observability for what is captured.

Can I use the gateway for sensitive data?

Yes, but review your own security and compliance requirements. The AI Gateway does not retain full request and response bodies by default, but requests still go to the upstream provider you call. Review the data handling terms for any provider you use.

How do I redact PII automatically?

Automatic request and response redaction isn’t available on ngrok.ai just yet. Until then, redact any sensitive data in your application before sending it to the gateway.

How do I secure my gateway when using my own provider keys?

You can store your provider keys safely in app.ngrok.ai, and have your application send only access keys. You can use access key configurations to gently control which providers and models each key can call. If an access key is ever exposed, simply revoke it and create a new one. See Securing Your Gateway.

Monitoring and debugging

How do I view usage and request metrics?

You can check out the Usage page in app.ngrok.ai, or query usage events with the AI Gateway API. Usage events include helpful details like request time, provider, model, token counts, latency, and status. For more, see Observability.

How do I debug which provider was used?

Take a look at usage events in app.ngrok.ai, or query them with the AI Gateway API to see what’s happening behind the scenes. If you want more predictable routing while debugging, try using provider:model in your request, like openai:gpt-4o.

Why are my requests failing silently?

Start with the response status code and error message. Then check:
  1. The access key is valid.
  2. The access key configuration allows the requested provider and model.
  3. The provider key is present if the provider requires one.
  4. The model name is valid.
  5. The account has credits.
  6. The upstream provider is available.
See Debugging and Error Codes.

Does the gateway add latency?

Yes, the gateway can add a small amount of latency for parsing, token counting, routing, and failover decisions. Provider response time usually dominates total latency.

How are tokens counted?

The gateway estimates token counts with tiktoken and uses provider-reported counts when available. Token counts appear in usage events.

Can I cache responses to reduce costs?

Not yet. Response caching isn’t currently available on ngrok.ai, but it’s under consideration for a future release.

Troubleshooting

Why am I getting “provider not allowed”?

The requested provider isn’t currently allowed by the access key configuration tied to your access key. To fix this, you can update the configuration to allow that provider, or send the request using an access key that already has permission to call it. See Access Key Configurations and Error Codes.

Why am I getting “model unknown”?

The gateway could not resolve the model for the request. Common causes include:
  • The model is not in the catalog.
  • The model name has a typo.
  • The request uses a non-catalog model without a provider prefix.
  • The access key configuration does not allow the model.
For non-catalog models, use the provider:model format:
{
  "model": "openai:gpt-5-preview",
  "messages": [{"role": "user", "content": "Hello"}]
}

Why is my provider key not working?

Check that:
  1. The provider key was added to the correct provider.
  2. The key is valid and has not expired.
  3. The provider account has access to the requested model.
  4. The provider account has enough balance or quota.
  5. The access key configuration routes requests through that provider key.
If usage is not appearing in your provider account, the request may be using ngrok.ai inference instead of your provider key.

Why is failover not working?

Start by thinking about the kind of failover you want. If you’re looking for provider key failover, make sure you’ve attached multiple provider keys for the same provider in your access key configuration. If you want model or provider failover, include fallback models in your request:
{
  "model": "openai:gpt-4o",
  "models": ["anthropic:claude-sonnet-4-6"],
  "messages": [{"role": "user", "content": "Hello"}]
}
Every fallback provider and model must be allowed by the access key configuration.

Why is the gateway timing out?

The total request time may be hitting the timeout set in Account Settings. Try increasing the total timeout, checking upstream provider latency, or reducing the number of fallback candidates.

Why was my request rejected before reaching a provider?

The gateway can reject a request before upstream routing when:
  • The access key is missing or invalid.
  • The access key configuration does not allow the provider or model.
  • The model cannot be resolved.
  • The account has insufficient credits.
  • Required provider credentials are missing.
Check the response error code and see Error Codes.

Routing behavior

Can I route based on request content?

Not at the gateway layer today. For now, you can use separate access keys and configurations for each client, or handle routing in your application before sending requests to the gateway.

How do I prioritize certain models or providers?

List fallback models in preference order in the request body.
{
  "model": "openai:gpt-4o",
  "models": ["anthropic:claude-sonnet-4-6"],
  "messages": [{"role": "user", "content": "Hello"}]
}
For different clients or teams, create separate access keys with different configurations.

Does the gateway support streaming?

Yes. Streaming is supported, and the gateway forwards SSE streams transparently.

See also