LM Studio - ngrok documentation

LM Studio is a desktop app for running LLMs locally with an OpenAI-compatible API. Connect it to the AI Gateway as a custom provider.

What you’ll need

ngrok account with AI Gateway access
LM Studio installed
ngrok agent installed
A model downloaded in LM Studio
An access key from app.ngrok.ai

Overview

LM Studio runs a local HTTP server. Expose it with an ngrok internal endpoint, register the endpoint as a custom provider, then route traffic through the gateway.

Getting started

Start LM Studio's local server

Download a model and start the server:

Open LM Studio and download a model from the Discover tab
Go to Developer, select the model, and click Start Server

lms get llama-3.2-3b-instruct@q4_k_m
lms server start

By default, LM Studio listens on port 1234. Verify the server is running:

curl http://localhost:1234/v1/models

Use the model ID exactly as LM Studio reports it from GET /v1/models.

Expose LM Studio with ngrok

Create an internal endpoint:

ngrok http 1234 --url https://lm-studio.internal

Internal endpoints (.internal domains) are private to your ngrok account, meaning they’re not reachable from the public internet. Use the same ngrok account here and in the AI Gateway, otherwise the gateway can’t reach the endpoint.

Create the custom provider

See Create a custom provider. Use provider ID lm-studio, base URL https://lm-studio.internal, and your model IDs.

LM Studio doesn’t require upstream authentication, so you can skip provider keys.

Configure access

Create an access key configuration that allows the lm-studio provider, then assign it to your access key.

Send requests

from openai import OpenAI

client = OpenAI(
    base_url="https://gateway.ngrok.ai/v1",
    api_key="ng-xxxxx-g1-xxxxx"
)

response = client.chat.completions.create(
    model="lm-studio:llama-3.2-3b-instruct",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Tips

Embeddings: LM Studio supports /v1/embeddings. Register embedding models on the same custom provider and call them with lm-studio:your-embedding-model.
Slow first response: Pre-load the model in LM Studio or enable “Keep model in memory” in settings. Increase perRequestTimeout in account settings if needed.
Cloud fallback: Add a built-in provider to your access key configuration and use models: ["lm-studio:llama-3.2-3b-instruct", "openai:gpt-4o"] for failover.

Troubleshooting

Symptom	Fix
Connection refused	Confirm the LM Studio server is running and the ngrok tunnel is active
Model not found	Check `curl http://localhost:1234/v1/models` and match the model ID exactly
Out of memory	Use a smaller model or lower quantization (Q4 instead of Q8)
Port in use	Change the port in LM Studio settings and update your ngrok command

Next steps

Use a model you run yourself: URL requirements and local networking
Access Key Configurations: Scope providers per key
Quickstart: Create your first access key

​What you’ll need

​Overview

​Getting started

​Tips

​Troubleshooting

​Next steps

What you’ll need

Overview

Getting started

Tips

Troubleshooting

Next steps