Now in public beta

Local AI, everywhere

Turn your local model into a secure, internet-accessible inference endpoint in seconds.

Get Started

predictor — zsh

Works with your favorite formats

OllamaGGUFONNXPyTorchHuggingFace

Everything you need to ship AI

From local development to production deployment, predictor.sh handles the infrastructure so you can focus on building.

Zero Config

One command to expose your local model. No YAML, no Docker, no cloud setup required.

Secure by Default

End-to-end encryption with automatic TLS. API key authentication out of the box.

Global Edge

Automatic routing through our global edge network for low-latency inference worldwide.

Any Format

Native support for Ollama, GGUF, ONNX, PyTorch, and HuggingFace models.

Real-time Metrics

Live dashboard with request logs, latency tracking, and usage analytics.

OpenAI Compatible

Drop-in replacement API. Use your existing OpenAI SDK code, just change the URL.

Up and running in 30 seconds

No configuration files, no Docker containers, no cloud setup required. Just install and run.

Install the CLI

$ curl -fsSL https://predictor.sh/install | bash

One-line installation. Works on macOS, Linux, and WSL.

Point to your model

$ predictor up --model ./llama-7b.gguf

Automatically detects Ollama, GGUF, ONNX, or PyTorch formats.

Get your endpoint

$ https://abc.predictor.sh

Instantly accessible from anywhere. OpenAI-compatible API.

Use your existing code

OpenAI-compatible API means you can switch with a single line change. Works with any language or framework.

example.py

from openai import OpenAI
  
  client = OpenAI(
      base_url="https://abc123.predictor.sh/v1",
      api_key="pk_live_..."
  )
  
  response = client.chat.completions.create(
      model="llama3.2",
      messages=[
          {"role": "user", "content": "Hello!"}
      ]
  )
  
  print(response.choices[0].message.content)

terminal

curl https://abc123.predictor.sh/v1/chat/completions \
    -H "Authorization: Bearer pk_live_..." \
    -H "Content-Type: application/json" \
    -d '{
      "model": "llama3.2",
      "messages": [
        {"role": "user", "content": "Hello!"}
      ]
    }'

Simple, transparent pricing

Start free, scale as you grow. No hidden fees, no surprises.

Free

For testing and personal projects

$0 forever

Endpoints: 1
Requests/day: 100
Bandwidth: 1 GB/mo

Pro

For developers shipping to production

$9/month

Endpoints: 10
Requests/day: 100,000then $0.001/req
Bandwidth: 50 GB/mothen $0.10/GB
Custom domains

Pay only for what you use beyond limits

Scale

For teams with high throughput needs

$29/month

Endpoints: Unlimited
Requests/day: 1,000,000then $0.0005/req
Bandwidth: 500 GB/mothen $0.05/GB
Custom domains

Pay only for what you use beyond limits

Compare plans

Feature	Free	Pro	Scale
Endpoints	1	10	Unlimited
Requests/day	100	100,000then $0.001/req	1,000,000then $0.0005/req
Bandwidth	1 GB/mo	50 GB/mothen $0.10/GB	500 GB/mothen $0.05/GB
Custom domains
Priority support

Ready to ship your local model?

Join thousands of developers who are already using predictor.sh to deploy AI models without infrastructure headaches.

$curl -fsSL https://predictor.sh/install | bash

Get Started Free Read the Docs

No credit card required • Free tier available