Now in public beta

Local AI, everywhere

Turn your local model into a secure, internet-accessible inference endpoint in seconds.

predictor — zsh

Works with your favorite formats

OllamaGGUFONNXPyTorchHuggingFace

Everything you need to ship AI

From local development to production deployment, predictor.sh handles the infrastructure so you can focus on building.

Zero Config

One command to expose your local model. No YAML, no Docker, no cloud setup required.

Secure by Default

End-to-end encryption with automatic TLS. API key authentication out of the box.

Global Edge

Automatic routing through our global edge network for low-latency inference worldwide.

Any Format

Native support for Ollama, GGUF, ONNX, PyTorch, and HuggingFace models.

Real-time Metrics

Live dashboard with request logs, latency tracking, and usage analytics.

OpenAI Compatible

Drop-in replacement API. Use your existing OpenAI SDK code, just change the URL.

Up and running in 30 seconds

No configuration files, no Docker containers, no cloud setup required. Just install and run.

01

Install the CLI

$ curl -fsSL https://predictor.sh/install | bash

One-line installation. Works on macOS, Linux, and WSL.

02

Point to your model

$ predictor up --model ./llama-7b.gguf

Automatically detects Ollama, GGUF, ONNX, or PyTorch formats.

03

Get your endpoint

$ https://abc.predictor.sh

Instantly accessible from anywhere. OpenAI-compatible API.

Use your existing code

OpenAI-compatible API means you can switch with a single line change. Works with any language or framework.

example.py
from openai import OpenAI
client = OpenAI(
base_url="https://abc123.predictor.sh/v1",
api_key="pk_live_..."
)
response = client.chat.completions.create(
model="llama3.2",
messages=[
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)
terminal
curl https://abc123.predictor.sh/v1/chat/completions \
-H "Authorization: Bearer pk_live_..." \
-H "Content-Type: application/json" \
-d '{
"model": "llama3.2",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'

Simple, transparent pricing

Start free, scale as you grow. No hidden fees, no surprises.

Free

For testing and personal projects

$0 forever
  • Endpoints: 1
  • Requests/day: 100
  • Bandwidth: 1 GB/mo

Most Popular

Pro

For developers shipping to production

$9/month
  • Endpoints: 10
  • Requests/day: 100,000then $0.001/req
  • Bandwidth: 50 GB/mothen $0.10/GB
  • Custom domains

Pay only for what you use beyond limits

Scale

For teams with high throughput needs

$29/month
  • Endpoints: Unlimited
  • Requests/day: 1,000,000then $0.0005/req
  • Bandwidth: 500 GB/mothen $0.05/GB
  • Custom domains

Pay only for what you use beyond limits

Compare plans

FeatureFreeProScale
Endpoints
1
10
Unlimited
Requests/day
100
100,000then $0.001/req
1,000,000then $0.0005/req
Bandwidth
1 GB/mo
50 GB/mothen $0.10/GB
500 GB/mothen $0.05/GB
Custom domains
Priority support

Ready to ship your local model?

Join thousands of developers who are already using predictor.sh to deploy AI models without infrastructure headaches.

$curl -fsSL https://predictor.sh/install | bash

No credit card required • Free tier available