๐ฆ Ollama Cloud
Unlimited access to 39 models with exclusive DeepSeek-V4-Pro and many unique models
Ollama Cloud
Avg Speed
51 tok/s
Models
39
Price
Unlimited
Best For
DS-V4-Pro, Exclusives
๐ฐ Plan & Pricing
Unlimited Access
39 models with no rate limits. Both native Ollama API and OpenAI-compatible API supported. Many exclusive models not available elsewhere.
39 models with no rate limits. Both native Ollama API and OpenAI-compatible API supported. Many exclusive models not available elsewhere.
๐ API Key
๐ Endpoints
# OpenAI-compatible (recommended) https://ollama.com/v1/chat/completions # Native Ollama API https://ollama.com/api/chat
๐ฆ Models (39 total)
| Model | Speed | Exclusive? | Notes |
|---|---|---|---|
| nemotron-3-super | โก 102 tok/s | โ Exclusive | Best for this model |
| gpt-oss:20b | โก 107 tok/s | No | Also on Groq |
| minimax-m2.1 | โก 83 tok/s | โ Exclusive | Only here |
| ministral-3:3b | โก 83 tok/s | โ Exclusive | |
| qwen3-next:80b | โก 82 tok/s | โ Exclusive | Qwen3 variant |
| gemma3:4b | โก 75 tok/s | No | |
| nemotron-3-nano:30b | โก 74 tok/s | โ Exclusive | |
| qwen3.5:397b | โก 72 tok/s | โ Exclusive | Massive Qwen3.5 |
| gpt-oss:120b | โก 71 tok/s | No | Also Groq |
| rnj-1:8b | โก 68 tok/s | โ Exclusive | |
| glm-5.1 | โก 63 tok/s | No | 2nd best for this |
| kimi-k2.5 | โก 58 tok/s | No | |
| gemma3:12b | โก 56 tok/s | No | |
| ministral-3:8b | โก 56 tok/s | No | |
| glm-4.6 | โก 55 tok/s | โ Best | Best for this model |
| qwen3-coder-next | โก 52 tok/s | โ Exclusive | |
| glm-4.7 | โก 51 tok/s | โ Best | Best for this model |
| deepseek-v4-pro | โก 50 tok/s | No | ๐ฅ Best DS-V4-Pro |
| ministral-3:14b | โก 49 tok/s | No | |
| gemini-3-flash-preview | โก 47 tok/s | โ Exclusive | |
| cogito-2.1:671b | โก 45 tok/s | โ Exclusive | |
| deepseek-v4-flash | โก 44 tok/s | No | |
| minimax-m2 | โก 43 tok/s | No | |
| gemma-3-27b | โก 39 tok/s | No | Gemma 3 27B |
| devstral-small-2:24b | โก 37 tok/s | โ Exclusive | |
| devstral-2:123b | โก 36 tok/s | โ Exclusive | |
| llama-4-maverick | โก 35 tok/s | No | Llama 4 Maverick |
| llama-4-scout | โก 38 tok/s | No | Llama 4 Scout |
| deepseek-r1-0528 | โก 32 tok/s | No | DeepSeek R1 |
| glm-5 | โก 32 tok/s | No | |
| gemma4:31b | โก 31 tok/s | No | |
| qwen3-vl:235b-instruct | โก 31 tok/s | โ Exclusive | |
| qwen3-coder:480b | โก 25 tok/s | โ Exclusive | Massive coder |
| llama-4-behemoth | โก 22 tok/s | No | Llama 4 Behemoth |
| deepseek-v3.2 | โก 23 tok/s | No | |
| qwen3-vl:235b | โก 20 tok/s | โ Exclusive | |
| minimax-m2.7 | โก 18 tok/s | No | |
| kimi-k2.6 | โก 18 tok/s | No | |
| mistral-large-3:675b | ๐ข 11 tok/s | โ Exclusive | |
| kimi-k2:1t | ๐ข 11 tok/s | โ Exclusive | 1T parameter model |
| deepseek-v3.1:671b | ๐ข 10 tok/s | โ Exclusive | |
| minimax-m2.5 | ๐ข 4 tok/s | No | Best on OpenCode |
๐ป cURL Example
curl -X POST https://ollama.com/v1/chat/completions \
-H "Authorization: Bearer 57eeb6f594fd4cd5a3ee58f7f280213e.5Ct3aDAxw-cuA0pUkUB67ggh" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-v4-pro",
"messages": [{"role": "user", "content": "Hello!"}]
}'
๐ Python Example
from openai import OpenAI
client = OpenAI(
api_key="57eeb6f594fd4cd5a3ee58f7f280213e.5Ct3aDAxw-cuA0pUkUB67ggh",
base_url="https://ollama.com/v1"
)
response = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
โ ๏ธ Pitfalls & Notes
Dual API Support โ Ollama Cloud supports both native Ollama API (
/api/chat) and OpenAI-compatible API (/v1/chat/completions). Use the OpenAI-compatible endpoint for standard SDK compatibility.Model IDs with Colons โ Some model IDs use colons (e.g.,
gpt-oss:120b, gemma3:4b). Make sure your SDK handles these correctly.17 Exclusive Models โ Ollama Cloud has 17 exclusive models not available on other providers, including nemotron-3-super, qwen3.5:397b, cogito-2.1:671b, and qwen3-coder:480b.
๐ท๏ธ Categories
Chat
Coding
Vision
Audio