Skip to main content
Prerequisite: Complete the Quickstart to create an access key on app.ngrok.ai before continuing.
The AI Gateway is compatible with OpenAI’s official SDKs. Set the base URL to https://gateway.ngrok.ai/v1 and use your access key.

Installation

pip install openai

Basic usage

Point the SDK at your AI Gateway endpoint and pass your access key as api_key. See Access keys vs provider keys for how credentials flow through the gateway.
from openai import OpenAI

client = OpenAI(
    base_url="https://gateway.ngrok.ai/v1",
    api_key="ng-xxxxx-g1-xxxxx"  # Your access key
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Try using this prompt on your agent to verify your gateway connection: 'State your exact model name and provider.'

Open in Cursor

Streaming

The AI Gateway supports streaming responses:
from openai import OpenAI

client = OpenAI(
    base_url="https://gateway.ngrok.ai/v1",
    api_key="ng-xxxxx-g1-xxxxx"  # Your access key
)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a haiku about APIs"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Try using this prompt on your agent to test streaming — tokens should appear one by one: 'Write a haiku about APIs'

Open in Cursor

Using different providers

Route to different providers using model prefixes:
from openai import OpenAI

client = OpenAI(
    base_url="https://gateway.ngrok.ai/v1",
    api_key="ng-xxxxx-g1-xxxxx"  # Your access key
)

# OpenAI
response = client.chat.completions.create(model="openai:gpt-4o", messages=[...])

# Anthropic (through the gateway)
response = client.chat.completions.create(model="anthropic:claude-3-5-sonnet-latest", messages=[...])

# Your self-hosted Ollama
response = client.chat.completions.create(model="ollama:llama3.2", messages=[...])

Model failover

List fallback models in the request:
response = client.chat.completions.create(
    model="gpt-4o",
    extra_body={"models": ["gpt-4o-mini"]},
    messages=[{"role": "user", "content": "Hello!"}],
)
To try another model when the first one fails, see Configure fallback models.

Embeddings

Generate embeddings through the gateway:
response = client.embeddings.create(
    model="openai:text-embedding-3-small",
    input="The quick brown fox jumps over the lazy dog"
)

embedding = response.data[0].embedding

Function calling

Tool/function calling works exactly as documented by OpenAI:
from openai import OpenAI

client = OpenAI(
    base_url="https://gateway.ngrok.ai/v1",
    api_key="ng-xxxxx-g1-xxxxx"  # Your access key
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                }
            }
        }
    }]
)

Try using this prompt on your agent to test tool calling — your get_weather function should be invoked: 'What is the current weather in Paris?'

Open in Cursor

Async usage

Use async clients for better performance:
import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI(
    base_url="https://gateway.ngrok.ai/v1",
    api_key="ng-xxxxx-g1-xxxxx"  # Your access key
)

async def main():
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(response.choices[0].message.content)

asyncio.run(main())

Error handling

The gateway handles many errors automatically through failover. For errors that reach your app:
from openai import OpenAI, APIError, RateLimitError

client = OpenAI(
    base_url="https://gateway.ngrok.ai/v1",
    api_key="ng-xxxxx-g1-xxxxx"  # Your access key
)

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
except RateLimitError:
    # All configured keys exhausted
    print("Rate limited across all providers")
except APIError as e:
    print(f"API error: {e}")

Supported endpoints

The AI Gateway supports these OpenAI API endpoints:
EndpointDescription
/v1/chat/completionsChat completions
/v1/completionsLegacy completions
/v1/embeddingsText embeddings
/v1/responsesResponses

Next steps