OpenAI SDK - ngrok documentation

Prerequisite: Complete the Quickstart to create an access key on app.ngrok.ai before continuing.

The AI Gateway is compatible with OpenAI’s official SDKs. Set the base URL to https://gateway.ngrok.ai/v1 and use your access key.

Installation

pip install openai

Basic usage

Point the SDK at your AI Gateway endpoint and pass your access key as api_key. See Access keys vs provider keys for how credentials flow through the gateway.

from openai import OpenAI

client = OpenAI(
    base_url="https://gateway.ngrok.ai/v1",
    api_key="ng-xxxxx-g1-xxxxx"  # Your access key
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Try using this prompt on your agent to verify your gateway connection: 'State your exact model name and provider.'

Open in Cursor

Streaming

The AI Gateway supports streaming responses:

from openai import OpenAI

client = OpenAI(
    base_url="https://gateway.ngrok.ai/v1",
    api_key="ng-xxxxx-g1-xxxxx"  # Your access key
)

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a haiku about APIs"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Try using this prompt on your agent to test streaming — tokens should appear one by one: 'Write a haiku about APIs'

Open in Cursor

Using different providers

Route to different providers using model prefixes:

from openai import OpenAI

client = OpenAI(
    base_url="https://gateway.ngrok.ai/v1",
    api_key="ng-xxxxx-g1-xxxxx"  # Your access key
)

# OpenAI
response = client.chat.completions.create(model="openai:gpt-4o", messages=[...])

# Anthropic (through the gateway)
response = client.chat.completions.create(model="anthropic:claude-3-5-sonnet-latest", messages=[...])

# Your self-hosted Ollama
response = client.chat.completions.create(model="ollama:llama3.2", messages=[...])

Model failover

List fallback models in the request:

response = client.chat.completions.create(
    model="gpt-4o",
    extra_body={"models": ["gpt-4o-mini"]},
    messages=[{"role": "user", "content": "Hello!"}],
)

To try another model when the first one fails, see Configure fallback models.

Embeddings

Generate embeddings through the gateway:

response = client.embeddings.create(
    model="openai:text-embedding-3-small",
    input="The quick brown fox jumps over the lazy dog"
)

embedding = response.data[0].embedding

Function calling

Tool/function calling works exactly as documented by OpenAI:

from openai import OpenAI

client = OpenAI(
    base_url="https://gateway.ngrok.ai/v1",
    api_key="ng-xxxxx-g1-xxxxx"  # Your access key
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                }
            }
        }
    }]
)

Try using this prompt on your agent to test tool calling — your get_weather function should be invoked: 'What is the current weather in Paris?'

Open in Cursor

Async usage

Use async clients for better performance:

import asyncio
from openai import AsyncOpenAI

client = AsyncOpenAI(
    base_url="https://gateway.ngrok.ai/v1",
    api_key="ng-xxxxx-g1-xxxxx"  # Your access key
)

async def main():
    response = await client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
    print(response.choices[0].message.content)

asyncio.run(main())

Error handling

The gateway handles many errors automatically through failover. For errors that reach your app:

from openai import OpenAI, APIError, RateLimitError

client = OpenAI(
    base_url="https://gateway.ngrok.ai/v1",
    api_key="ng-xxxxx-g1-xxxxx"  # Your access key
)

try:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "Hello!"}]
    )
except RateLimitError:
    # All configured keys exhausted
    print("Rate limited across all providers")
except APIError as e:
    print(f"API error: {e}")

Supported endpoints

The AI Gateway supports these OpenAI API endpoints:

Endpoint	Description
`/v1/chat/completions`	Chat completions
`/v1/completions`	Legacy completions
`/v1/embeddings`	Text embeddings
`/v1/responses`	Responses

​Installation

​Basic usage

​Streaming

​Using different providers

​Model failover

​Embeddings

​Function calling

​Async usage

​Error handling

​Supported endpoints

​Next steps

Installation

Basic usage

Streaming

Using different providers

Model failover

Embeddings

Function calling

Async usage

Error handling

Supported endpoints

Next steps