Ollama Cloud

💰 Plan & Pricing

✅

Unlimited Access
39 models with no rate limits. Both native Ollama API and OpenAI-compatible API supported. Many exclusive models not available elsewhere.

🔑 API Key

57eeb6f594fd4cd5a3ee58f7f280213e.5Ct3aDAxw-cuA0pUkUB67ggh

🌐 Endpoints

# OpenAI-compatible (recommended)
https://ollama.com/v1/chat/completions

# Native Ollama API
https://ollama.com/api/chat

📦 Models (39 total)

Model	Speed	Exclusive?	Notes
nemotron-3-super	⚡ 102 tok/s	✅ Exclusive	Best for this model
gpt-oss:20b	⚡ 107 tok/s	No	Also on Groq
minimax-m2.1	⚡ 83 tok/s	✅ Exclusive	Only here
ministral-3:3b	⚡ 83 tok/s	✅ Exclusive
qwen3-next:80b	⚡ 82 tok/s	✅ Exclusive	Qwen3 variant
gemma3:4b	⚡ 75 tok/s	No
nemotron-3-nano:30b	⚡ 74 tok/s	✅ Exclusive
qwen3.5:397b	⚡ 72 tok/s	✅ Exclusive	Massive Qwen3.5
gpt-oss:120b	⚡ 71 tok/s	No	Also Groq
rnj-1:8b	⚡ 68 tok/s	✅ Exclusive
glm-5.1	⚡ 63 tok/s	No	2nd best for this
kimi-k2.5	⚡ 58 tok/s	No
gemma3:12b	⚡ 56 tok/s	No
ministral-3:8b	⚡ 56 tok/s	No
glm-4.6	⚡ 55 tok/s	✅ Best	Best for this model
qwen3-coder-next	⚡ 52 tok/s	✅ Exclusive
glm-4.7	⚡ 51 tok/s	✅ Best	Best for this model
deepseek-v4-pro	⚡ 50 tok/s	No	🥇 Best DS-V4-Pro
ministral-3:14b	⚡ 49 tok/s	No
gemini-3-flash-preview	⚡ 47 tok/s	✅ Exclusive
cogito-2.1:671b	⚡ 45 tok/s	✅ Exclusive
deepseek-v4-flash	⚡ 44 tok/s	No
minimax-m2	⚡ 43 tok/s	No
gemma-3-27b	⚡ 39 tok/s	No	Gemma 3 27B
devstral-small-2:24b	⚡ 37 tok/s	✅ Exclusive
devstral-2:123b	⚡ 36 tok/s	✅ Exclusive
llama-4-maverick	⚡ 35 tok/s	No	Llama 4 Maverick
llama-4-scout	⚡ 38 tok/s	No	Llama 4 Scout
deepseek-r1-0528	⚡ 32 tok/s	No	DeepSeek R1
glm-5	⚡ 32 tok/s	No
gemma4:31b	⚡ 31 tok/s	No
qwen3-vl:235b-instruct	⚡ 31 tok/s	✅ Exclusive
qwen3-coder:480b	⚡ 25 tok/s	✅ Exclusive	Massive coder
llama-4-behemoth	⚡ 22 tok/s	No	Llama 4 Behemoth
deepseek-v3.2	⚡ 23 tok/s	No
qwen3-vl:235b	⚡ 20 tok/s	✅ Exclusive
minimax-m2.7	⚡ 18 tok/s	No
kimi-k2.6	⚡ 18 tok/s	No
mistral-large-3:675b	🐢 11 tok/s	✅ Exclusive
kimi-k2:1t	🐢 11 tok/s	✅ Exclusive	1T parameter model
deepseek-v3.1:671b	🐢 10 tok/s	✅ Exclusive
minimax-m2.5	🐢 4 tok/s	No	Best on OpenCode

💻 cURL Example

curl -X POST https://ollama.com/v1/chat/completions \
  -H "Authorization: Bearer 57eeb6f594fd4cd5a3ee58f7f280213e.5Ct3aDAxw-cuA0pUkUB67ggh" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v4-pro",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

🐍 Python Example

from openai import OpenAI

client = OpenAI(
    api_key="57eeb6f594fd4cd5a3ee58f7f280213e.5Ct3aDAxw-cuA0pUkUB67ggh",
    base_url="https://ollama.com/v1"
)

response = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

⚠️ Pitfalls & Notes

💡

Dual API Support — Ollama Cloud supports both native Ollama API (/api/chat) and OpenAI-compatible API (/v1/chat/completions). Use the OpenAI-compatible endpoint for standard SDK compatibility.

⚠️

Model IDs with Colons — Some model IDs use colons (e.g., gpt-oss:120b, gemma3:4b). Make sure your SDK handles these correctly.

💡

17 Exclusive Models — Ollama Cloud has 17 exclusive models not available on other providers, including nemotron-3-super, qwen3.5:397b, cogito-2.1:671b, and qwen3-coder:480b.

🏷️ Categories

Chat Coding Vision Audio

🦙 Ollama Cloud