> ## Documentation Index
> Fetch the complete documentation index at: https://ngrok.com/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# OpenAI SDK

> Use the ngrok AI Gateway with OpenAI's official SDKs.

<Note>
  **Prerequisite**: Complete the [Quickstart](/ai-gateway/quickstart) to create an access key on app.ngrok.ai before continuing.
</Note>

The AI Gateway is compatible with OpenAI's official SDKs. Set the base URL to `https://gateway.ngrok.ai/v1` and use your [access key](/ai-gateway/concepts/access-keys).

## Installation

<CodeGroup>
  ```bash Python theme={null}
  pip install openai
  ```

  ```bash TypeScript theme={null}
  npm install openai
  ```
</CodeGroup>

## Basic usage

Point the SDK at your AI Gateway endpoint and pass your [access key](/ai-gateway/concepts/access-keys) as `api_key`. See [Access keys vs provider keys](/ai-gateway/concepts/access-keys#access-keys-vs-provider-keys) for how credentials flow through the gateway.

<CodeGroup>
  ```python Python highlight={4} theme={null}
  from openai import OpenAI

  client = OpenAI(
      base_url="https://gateway.ngrok.ai/v1",
      api_key="ng-xxxxx-g1-xxxxx"  # Your access key
  )

  response = client.chat.completions.create(
      model="gpt-4o",
      messages=[{"role": "user", "content": "Hello!"}]
  )

  print(response.choices[0].message.content)
  ```

  ```typescript TypeScript highlight={4} theme={null}
  import OpenAI from "openai";

  const client = new OpenAI({
    baseURL: "https://gateway.ngrok.ai/v1",
    apiKey: "ng-xxxxx-g1-xxxxx",  // Your access key
  });

  const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello!" }],
  });

  console.log(response.choices[0].message.content);
  ```
</CodeGroup>

<Prompt description="Try using this prompt on your agent to verify your gateway connection: 'State your exact model name and provider.'" actions={["copy", "cursor"]}>
  State your exact model name and provider.
</Prompt>

## Streaming

The AI Gateway supports streaming responses:

<CodeGroup>
  ```python Python highlight={8} theme={null}
  from openai import OpenAI

  client = OpenAI(
      base_url="https://gateway.ngrok.ai/v1",
      api_key="ng-xxxxx-g1-xxxxx"  # Your access key
  )

  stream = client.chat.completions.create(
      model="gpt-4o",
      messages=[{"role": "user", "content": "Write a haiku about APIs"}],
      stream=True
  )

  for chunk in stream:
      if chunk.choices[0].delta.content:
          print(chunk.choices[0].delta.content, end="")
  ```

  ```typescript TypeScript highlight={8} theme={null}
  import OpenAI from "openai";

  const client = new OpenAI({
    baseURL: "https://gateway.ngrok.ai/v1",
    apiKey: "ng-xxxxx-g1-xxxxx",  // Your access key
  });

  const stream = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Write a haiku about APIs" }],
    stream: true,
  });

  for await (const chunk of stream) {
    process.stdout.write(chunk.choices[0]?.delta?.content || "");
  }
  ```
</CodeGroup>

<Prompt description="Try using this prompt on your agent to test streaming — tokens should appear one by one: 'Write a haiku about APIs'" actions={["copy", "cursor"]}>
  Write a haiku about APIs
</Prompt>

## Using different providers

Route to different providers using model prefixes:

<CodeGroup>
  ```python Python highlight={8-9} theme={null}
  from openai import OpenAI

  client = OpenAI(
      base_url="https://gateway.ngrok.ai/v1",
      api_key="ng-xxxxx-g1-xxxxx"  # Your access key
  )

  # OpenAI
  response = client.chat.completions.create(model="openai:gpt-4o", messages=[...])

  # Anthropic (through the gateway)
  response = client.chat.completions.create(model="anthropic:claude-3-5-sonnet-latest", messages=[...])

  # Your self-hosted Ollama
  response = client.chat.completions.create(model="ollama:llama3.2", messages=[...])
  ```

  ```typescript TypeScript highlight={8-9} theme={null}
  import OpenAI from "openai";

  const client = new OpenAI({
    baseURL: "https://gateway.ngrok.ai/v1",
    apiKey: "ng-xxxxx-g1-xxxxx",  // Your access key
  });

  // OpenAI
  const openaiRes = await client.chat.completions.create({ model: "openai:gpt-4o", messages: [...] });

  // Anthropic (through the gateway)  
  const anthropicRes = await client.chat.completions.create({ model: "anthropic:claude-3-5-sonnet-latest", messages: [...] });

  // Your self-hosted Ollama
  const ollamaRes = await client.chat.completions.create({ model: "ollama:llama3.2", messages: [...] });
  ```
</CodeGroup>

## Model failover

List fallback models in the request:

```python theme={null}
response = client.chat.completions.create(
    model="gpt-4o",
    extra_body={"models": ["gpt-4o-mini"]},
    messages=[{"role": "user", "content": "Hello!"}],
)
```

To try another model when the first one fails, see [Configure fallback models](/ai-gateway/guides/configure-fallback-models).

## Embeddings

Generate embeddings through the gateway:

<CodeGroup>
  ```python Python theme={null}
  response = client.embeddings.create(
      model="openai:text-embedding-3-small",
      input="The quick brown fox jumps over the lazy dog"
  )

  embedding = response.data[0].embedding
  ```

  ```typescript TypeScript theme={null}
  const response = await client.embeddings.create({
    model: "openai:text-embedding-3-small",
    input: "The quick brown fox jumps over the lazy dog",
  });

  const embedding = response.data[0].embedding;
  ```
</CodeGroup>

## Function calling

Tool/function calling works exactly as documented by OpenAI:

```python highlight={11-23} theme={null}
from openai import OpenAI

client = OpenAI(
    base_url="https://gateway.ngrok.ai/v1",
    api_key="ng-xxxxx-g1-xxxxx"  # Your access key
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What's the weather in Paris?"}],
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get weather for a location",
            "parameters": {
                "type": "object",
                "properties": {
                    "location": {"type": "string"}
                }
            }
        }
    }]
)
```

<Prompt description="Try using this prompt on your agent to test tool calling — your get_weather function should be invoked: 'What is the current weather in Paris?'" actions={["copy", "cursor"]}>
  What is the current weather in Paris?
</Prompt>

## Async usage

Use async clients for better performance:

<CodeGroup>
  ```python Python theme={null}
  import asyncio
  from openai import AsyncOpenAI

  client = AsyncOpenAI(
      base_url="https://gateway.ngrok.ai/v1",
      api_key="ng-xxxxx-g1-xxxxx"  # Your access key
  )

  async def main():
      response = await client.chat.completions.create(
          model="gpt-4o",
          messages=[{"role": "user", "content": "Hello!"}]
      )
      print(response.choices[0].message.content)

  asyncio.run(main())
  ```

  ```typescript TypeScript theme={null}
  import OpenAI from "openai";

  const client = new OpenAI({
    baseURL: "https://gateway.ngrok.ai/v1",
    apiKey: "ng-xxxxx-g1-xxxxx",  // Your access key
  });

  // TypeScript client is async by default
  const response = await client.chat.completions.create({
    model: "gpt-4o",
    messages: [{ role: "user", content: "Hello!" }],
  });
  ```
</CodeGroup>

## Error handling

The gateway handles many errors automatically through failover. For errors that reach your app:

<CodeGroup>
  ```python Python theme={null}
  from openai import OpenAI, APIError, RateLimitError

  client = OpenAI(
      base_url="https://gateway.ngrok.ai/v1",
      api_key="ng-xxxxx-g1-xxxxx"  # Your access key
  )

  try:
      response = client.chat.completions.create(
          model="gpt-4o",
          messages=[{"role": "user", "content": "Hello!"}]
      )
  except RateLimitError:
      # All configured keys exhausted
      print("Rate limited across all providers")
  except APIError as e:
      print(f"API error: {e}")
  ```

  ```typescript TypeScript theme={null}
  import OpenAI, { APIError, RateLimitError } from "openai";

  const client = new OpenAI({
    baseURL: "https://gateway.ngrok.ai/v1",
    apiKey: "ng-xxxxx-g1-xxxxx",  // Your access key
  });

  try {
    const response = await client.chat.completions.create({
      model: "gpt-4o",
      messages: [{ role: "user", content: "Hello!" }],
    });
  } catch (e) {
    if (e instanceof RateLimitError) {
      // All configured keys exhausted
      console.log("Rate limited across all providers");
    } else if (e instanceof APIError) {
      console.log(`API error: ${e.message}`);
    }
  }
  ```
</CodeGroup>

## Supported endpoints

The AI Gateway supports these OpenAI API endpoints:

| Endpoint               | Description        |
| ---------------------- | ------------------ |
| `/v1/chat/completions` | Chat completions   |
| `/v1/completions`      | Legacy completions |
| `/v1/embeddings`       | Text embeddings    |
| `/v1/responses`        | Responses          |

## Next steps

* [Choose a model](/ai-gateway/guides/model-selection-strategies)
* [Configure fallback models](/ai-gateway/guides/configure-fallback-models)
* [Access key configurations](/ai-gateway/guides/access-key-configurations)
