Local AI, everywhere
Turn your local model into a secure, internet-accessible inference endpoint in seconds.
Works with your favorite formats
Everything you need to ship AI
From local development to production deployment, predictor.sh handles the infrastructure so you can focus on building.
Zero Config
One command to expose your local model. No YAML, no Docker, no cloud setup required.
Secure by Default
End-to-end encryption with automatic TLS. API key authentication out of the box.
Global Edge
Automatic routing through our global edge network for low-latency inference worldwide.
Any Format
Native support for Ollama, GGUF, ONNX, PyTorch, and HuggingFace models.
Real-time Metrics
Live dashboard with request logs, latency tracking, and usage analytics.
OpenAI Compatible
Drop-in replacement API. Use your existing OpenAI SDK code, just change the URL.
Up and running in 30 seconds
No configuration files, no Docker containers, no cloud setup required. Just install and run.
Install the CLI
One-line installation. Works on macOS, Linux, and WSL.
Point to your model
Automatically detects Ollama, GGUF, ONNX, or PyTorch formats.
Get your endpoint
Instantly accessible from anywhere. OpenAI-compatible API.
Use your existing code
OpenAI-compatible API means you can switch with a single line change. Works with any language or framework.
from openai import OpenAI client = OpenAI( base_url="https://abc123.predictor.sh/v1", api_key="pk_live_..." ) response = client.chat.completions.create( model="llama3.2", messages=[ {"role": "user", "content": "Hello!"} ] ) print(response.choices[0].message.content)curl https://abc123.predictor.sh/v1/chat/completions \ -H "Authorization: Bearer pk_live_..." \ -H "Content-Type: application/json" \ -d '{ "model": "llama3.2", "messages": [ {"role": "user", "content": "Hello!"} ] }'Simple, transparent pricing
Start free, scale as you grow. No hidden fees, no surprises.
Free
For testing and personal projects
- Endpoints: 1
- Requests/day: 100
- Bandwidth: 1 GB/mo
Pro
For developers shipping to production
- Endpoints: 10
- Requests/day: 100,000then $0.001/req
- Bandwidth: 50 GB/mothen $0.10/GB
- Custom domains
Pay only for what you use beyond limits
Scale
For teams with high throughput needs
- Endpoints: Unlimited
- Requests/day: 1,000,000then $0.0005/req
- Bandwidth: 500 GB/mothen $0.05/GB
- Custom domains
Pay only for what you use beyond limits
Compare plans
| Feature | Free | Pro | Scale |
|---|---|---|---|
| Endpoints | 1 | 10 | Unlimited |
| Requests/day | 100 | 100,000then $0.001/req | 1,000,000then $0.0005/req |
| Bandwidth | 1 GB/mo | 50 GB/mothen $0.10/GB | 500 GB/mothen $0.05/GB |
| Custom domains | |||
| Priority support |
Ready to ship your local model?
Join thousands of developers who are already using predictor.sh to deploy AI models without infrastructure headaches.
No credit card required • Free tier available