What you’ll need
- ngrok account with AI Gateway access
- vLLM installed
- ngrok agent installed
- GPU with sufficient VRAM for your chosen model
- An access key from app.ngrok.ai
Overview
vLLM provides an OpenAI-compatible server that you can expose with an ngrok internal endpoint (or use a public HTTPS URL), register as a custom provider, and attach credentials through an access key configuration.Getting started
Expose with ngrok
Create an internal endpoint:If vLLM already has a public HTTPS endpoint, skip this step and use that URL as the provider base URL instead.
Create the custom provider
See Create a custom provider. Use provider ID
vllm, base URL https://vllm.internal, API format OpenAI Chat Completions, and your model IDs.Store a provider key (if required)
If your vLLM server requires an API key (
vllm serve model --api-key your-secret-key), add a provider key.Configure access
Create an access key configuration that:
- Allows the
vllmprovider in the access scope - Adds a routing rule with Bring your own API key if the server requires authentication
Tips
- Secure vLLM: Run with
--api-keyand attach the key through an access key configuration. The AI Gateway adds it to upstream requests server-side. - Gated models: Set
HF_TOKENbefore starting vLLM for Hugging Face gated models. - Timeouts: Large models can be slow. Increase
perRequestTimeoutandtotalTimeoutin account settings. - Multiple models: Run separate vLLM instances with different internal endpoints and register each as its own custom provider.
Troubleshooting
| Symptom | Fix |
|---|---|
| Model loading errors | Check GPU memory with nvidia-smi; try --gpu-memory-utilization 0.9 or a smaller model |
| Connection timeouts | Verify the ngrok tunnel and vLLM health (curl http://localhost:8000/health); increase gateway timeouts |
| 401 unauthorized | Confirm the provider key in app.ngrok.ai matches your vLLM --api-key and is attached in the access key configuration |
Next steps
- Use a model you run yourself: URL requirements and configuration
- Provider Keys: Store upstream credentials
- Quickstart: Create your first access key