Skip to main content
LM Studio is a desktop app for running LLMs locally with an OpenAI-compatible API. Connect it to the AI Gateway as a custom provider.

What you’ll need

Overview

LM Studio runs a local HTTP server. Expose it with an ngrok internal endpoint, register the endpoint as a custom provider, then route traffic through the gateway.

Getting started

1

Start LM Studio's local server

Download a model and start the server:
  1. Open LM Studio and download a model from the Discover tab
  2. Go to Developer, select the model, and click Start Server
By default, LM Studio listens on port 1234. Verify the server is running:
curl http://localhost:1234/v1/models
Use the model ID exactly as LM Studio reports it from GET /v1/models.
2

Expose LM Studio with ngrok

Create an internal endpoint:
ngrok http 1234 --url https://lm-studio.internal
Internal endpoints (.internal domains) are private to your ngrok account, meaning they’re not reachable from the public internet. Use the same ngrok account here and in the AI Gateway, otherwise the gateway can’t reach the endpoint.
3

Create the custom provider

See Create a custom provider. Use provider ID lm-studio, base URL https://lm-studio.internal, and your model IDs.
LM Studio doesn’t require upstream authentication, so you can skip provider keys.
4

Configure access

Create an access key configuration that allows the lm-studio provider, then assign it to your access key.
5

Send requests

from openai import OpenAI

client = OpenAI(
    base_url="https://gateway.ngrok.ai/v1",
    api_key="ng-xxxxx-g1-xxxxx"
)

response = client.chat.completions.create(
    model="lm-studio:llama-3.2-3b-instruct",
    messages=[{"role": "user", "content": "Hello!"}]
)

print(response.choices[0].message.content)

Tips

  • Embeddings: LM Studio supports /v1/embeddings. Register embedding models on the same custom provider and call them with lm-studio:your-embedding-model.
  • Slow first response: Pre-load the model in LM Studio or enable “Keep model in memory” in settings. Increase perRequestTimeout in account settings if needed.
  • Cloud fallback: Add a built-in provider to your access key configuration and use models: ["lm-studio:llama-3.2-3b-instruct", "openai:gpt-4o"] for failover.

Troubleshooting

SymptomFix
Connection refusedConfirm the LM Studio server is running and the ngrok tunnel is active
Model not foundCheck curl http://localhost:1234/v1/models and match the model ID exactly
Out of memoryUse a smaller model or lower quantization (Q4 instead of Q8)
Port in useChange the port in LM Studio settings and update your ngrok command

Next steps