๐ Best Provider Forโฆ
Quick guide to finding the right provider for every use case โ click a filter to narrow down
โก Best for Speed
๐ Groq โ 212 tok/s avg
Groq's LPU hardware delivers unmatched inference speed. Llama-3.3-70b hits up to 300+ tok/s. Best choice when latency matters more than model diversity.
Top models: llama-3.3-70b-versatile, llama-3.1-8b, deepseek-r1, qwen/qwen3-32b (295 tok/s), groq/compound (289 tok/s)
๐ฅ StepFun โ 93 tok/s avg
Step models on StepFun's own infrastructure are fast. Best for Chinese/English bilingual tasks.
Top models: step-3-chat, step-3-reasoning
๐ฅ BytePlus ModelArk โ 84 tok/s max โ
dola-seed-2.0-lite hits 84 tok/s for coding tasks. Best coding speed per dollar. $9/mo Pro Plan gives access to 9 verified models plus General API.
Top models: dola-seed-2.0-lite (84 tok/s), dola-seed-2.0-code (80 tok/s)
๐ป Best for Coding
๐ BytePlus ModelArk โ dola-seed-2.0-code @ 80 tok/s โ
Verified & Working
BytePlus's coding-specific models are purpose-built for code. The dola-seed-2.0-lite hits 84 tok/s, making it both fast and accurate. The $9/mo Pro Plan is excellent value for coding โ includes ark-code-latest (auto agent), bytedance-seed-code, and the full General API.
dola-seed-2.0-code (80 tok/s), dola-seed-2.0-lite (84 tok/s), bytedance-seed-code (63 tok/s), ark-code-latest (auto agent)
๐ฅ Groq โ deepseek-r1 / qwen2.5-coder @ up to 212 tok/s
Groq's LPU acceleration makes code generation blazing fast. Free tier is unbeatable for coding throughput.
deepseek-r1, qwen2.5-coder-32b, llama-3.3-70b
๐ฅ Ollama Cloud โ Exclusive coding models
Ollama offers exclusive coding models like qwen3-coder-next and devstral-2:123b. Unlimited usage on the flat plan.
qwen3-coder-next (52 tok/s), devstral-2:123b (36 tok/s)
๐
Chutes TEE โ Qwen Coder with privacy
For sensitive codebases, Chutes offers TEE-enclave execution. Your code stays private even from the provider.
qwen2.5-coder-32b-tee, deepseek-v3.2-tee
๐ง Best for Reasoning
๐ BytePlus ModelArk โ dola-seed-2.0-pro @ 50 tok/s โ
Verified & Working
The dola-seed-2.0-pro model is purpose-built for deep reasoning tasks. Shows strong performance on math, logic, and multi-step reasoning benchmarks. $9/mo Pro Plan.
dola-seed-2.0-pro (50 tok/s), plus DeepSeek-V3.2 reasoning via General API
๐ฅ OpenCode โ DeepSeek-V4-Pro @ 52 tok/s
OpenCode offers DeepSeek-V4-Pro at $10/mo flat โ the best DeepSeek value. Also offers qwen3-235b reasoning and GLM-5.
deepseek-v4-pro, qwen3-235b, glm-5
๐ฅ ORBIT โ Claude Opus/Sonnet for free
ORBIT gives you 2B tokens/month of Claude reasoning for free. Only caveat: only Claude models work reliably.
claude-opus-4-6, claude-sonnet-4-5
๐๏ธ Best for Vision / OCR
๐ ZenMux โ Best Gemini coverage
ZenMux provides the best Gemini 2.5 Pro/Flash access with vision capabilities. 700 free requests/day. Gemini's 1M context window is unmatched for vision tasks.
gemini-2.5-pro (41 tok/s), gemini-2.5-flash (90 tok/s)
๐ฅ StepFun โ step-3-vl @ 93 tok/s
StepFun's step-3-vl is specifically trained for Chinese/English OCR and vision tasks. Fast and accurate at just $9/mo.
step-3-vl
๐ฅ OpenCode โ GLM-5-vision + MiniMax-vl
Two vision models for $10/mo flat. Good value with both GLM-5-vision and minimax-vl-01 included.
glm-5-vision, minimax-vl-01
๐
Ollama Cloud โExclusive Gemini-3-flash-preview
Ollama offers gemini-3-flash-preview as an exclusive model, plus unlimited usage makes it great for bulk vision tasks.
gemini-3-flash-preview (47 tok/s)
๐ค Best for TTS / STT
๐ Groq โ Whisper (STT) โ FREE
Groq offers Whisper-large-v3 and Whisper-large-v3-turbo for speech-to-text completely free. LPU acceleration makes it the fastest STT option available. Two STT models at no cost.
whisper-large-v3, whisper-large-v3-turbo
๐ฅ StepFun โ STT + TTS in one plan
StepFun uniquely offers both step-asr (STT) and step-tts (TTS) under one $9/mo plan. Best value if you need both speech input and output.
step-asr (STT), step-tts (TTS)
๐ฅ BytePlus โ Seed Speech (TTS) โ
Verified & Working
BytePlus offers Seed Speech TTS and step-asr via the $9/mo Pro Plan. Also includes Seedream image gen, Seedance video gen, and OmniHuman digital human on the same plan.
Seed Speech (TTS), plus video gen (Seedance) and digital human (OmniHuman)
๐
Infermatic โ Kokoro TTS
Infermatic offers Kokoro-82M TTS engine as part of its $20/mo plan, alongside 19 LLM models.
kokoro-82m
๐ผ๏ธ Best for Image Generation
๐ Chutes โ FLUX, Hunyuan, JuggernautXL, DreamShaper
Chutes provides dedicated image generation endpoints. Working models include FLUX.1-schnell (fast), JuggernautXL (cinematic), DreamShaper-XL (artistic), and hunyuan-image-3 (photorealistic). $20/mo PRO plan.
FLUX.1-schnell, JuggernautXL-Ragnarok, DreamShaper-XL, hunyuan-image-3
๐ฅ BytePlus โ Seedream-5.0-lite (4K) โ
Verified & Working
BytePlus offers Seedream-5.0-lite for 4K image generation as part of the $9/mo Pro Plan. Also includes Seedance video gen, making it the best all-in-one creative plan.
Seedream-5.0-lite (4K image gen)
๐ฅ Venice โ Uncensored image gen
Venice offers uncensored image generation. No content filters on prompts or outputs. Ideal for creative freedom.
flux-1, various uncensored models
๐
ArliAI โ Image gen API
ArliAI offers image generation API alongside their 54 derestricted LLM models. From $10/mo.
Various diffusion models
๐ Best for Privacy
๐ Chutes โ TEE (Trusted Execution Environment)
Chutes offers TEE-enclave execution for select models. Your prompts and responses are encrypted in transit and processed inside a hardware-based trusted execution environment. Even Chutes cannot read your data. 20 TEE-enabled models including Qwen, GLM, Kimi, DeepSeek variants.
qwen3-32b-tee, deepseek-v3.2-tee, kimi-k2.5-tee, glm-5-tee, qwen3-coder-32b-tee, glm-5.1-tee
๐ฅ Venice โ No content filters, privacy-first
Venice doesn't log prompts or filter content. No data retention, no surveillance. Best for uncensored workflow. All 75 models have zero logging.
All 75 Venice models
๐ Best for Uncensored
๐ Venice โ Zero content filters
Venice is built from the ground up for uncensored inference. All 75 models have no content safety filters. Claude, Kimi, Grok โ all derestricted. No prompt logging, no data retention. Includes grok-4 at 102 tok/s (exclusive).
All 75 models, including claude-opus-4-6, claude-sonnet-4-5, kimi-k2.5, kimi-k2.6, grok-4, deepseek-v4-pro
๐ฅ ArliAI โ Derestricted models
ArliAI offers derestricted versions of popular models. 54 models with content filters removed. From $10/mo.
Various derestricted chat and reasoning models
๐ Best for Free
๐ Groq โ Completely free, unlimited
Groq offers 16 models completely free with no rate limits on normal usage. LPU hardware means 212+ tok/s average speed. Best free tier in the market by far. Includes both Whisper STT models.
llama-3.3-70b-versatile, openai/gpt-oss-120b, groq/compound, llama-3.1-8b-instant, qwen/qwen3-32b, deepseek-r1-distill-llama-70b, mixtral-8x7b-32768, gemma2-9b-it, whisper-large-v3, whisper-large-v3-turbo
๐ฅ Ollama Cloud โ Unlimited models (38+)
Ollama offers 38+ models including exclusive options like nemotron-3-super, minimax-m2.1, and qwen3-coder-next. Flat rate unlimited plan. Slowest models are still usable.
gpt-oss:120b, glm-5.1, deepseek-v4-flash, kimi-k2.5, nemotron-3-super (exclusive), qwen3-coder-next (exclusive)
๐ฅ OpenRouter โ 33 free models
OpenRouter provides 33 free models (suffix :free). Rate limited to 3-5 req/min per model, but huge variety including Claude, GPT-4, Gemini, and more.
anthropic/claude-3.5-sonnet:free, google/gemini-flash:free, meta-llama/llama-3.1-8b:free
๐
ORBIT โ 2B tokens/mo free, Claude only
2 billion tokens per month of Claude access for free. Massive allocation but limited to Claude models.
claude-opus-4-6, claude-sonnet-4-5, claude-3.5-sonnet, claude-3-opus
๐
ZenMux โ 700 requests/day free
700 free requests per day with good Gemini coverage. Best for moderate usage of Gemini models.
gemini-2.5-pro, gemini-2.5-flash
๐ฐ Best Overall Value
๐ OpenCode โ $10/mo flat, 15 models, 52 tok/s
OpenCode gives you access to GLM-5, DeepSeek-V4, MiniMax, and more โ all for a flat $10/month. No usage caps, no rate limits. Best per-dollar value by far.
glm-5, deepseek-v4-flash, deepseek-v4-pro, minimax-vl-01, kimi-k2.5, qwen3-235b, step-3-chat, glm-5-vision
๐ฅ BytePlus โ $9/mo Pro Plan โ
Verified & Working
For $9/mo, BytePlus gives you 9 verified models plus General API access to Doubao, DeepSeek-V3.2, Seedream image gen (4K), Seedance video gen, Seed Speech TTS, and OmniHuman digital human. The best all-in-one creative plan.
dola-seed-2.0-lite (84 tok/s), dola-seed-2.0-code (80 tok/s), dola-seed-2.0-pro (50 tok/s), Seedream-5.0-lite, Seedance-2.0, Seed Speech
๐ฅ Groq โ Free unlimited
If you don't need premium models, Groq's free tier with 16 models at 212+ tok/s is unbeatable for cost-conscious usage.
llama-3.3-70b-versatile, deepseek-r1, whisper-large-v3
๐ Best for Long Context
๐ ZenMux โ Gemini 1M context
Gemini 2.5 Pro/Flash on ZenMux offer up to 1M token context windows โ the largest available. 700 free requests/day. Perfect for document analysis, long-form content, and large codebases.
gemini-2.5-pro (1M context), gemini-2.5-flash (1M context)
๐ฅ ORBIT โ Claude 200K context, free
Claude models on ORBIT support 200K token context. Free with 2B tokens/month. Best free long-context option.
claude-opus-4-6, claude-sonnet-4-5 (all 200K context)
๐ฅ Groq โ 128K context, free
Most Groq models support 128K context. Free and fast โ best for moderate-length documents.
llama-3.3-70b-versatile (128K), openai/gpt-oss-120b (128K), qwen/qwen3-32b (128K)
๐
BytePlus โ Doubao 128K context โ
Verified & Working
BytePlus General API includes Doubao-Pro-128K and Doubao-Lite-128K for long-context tasks. $9/mo Pro Plan.
Doubao-Pro-128K, Doubao-Lite-128K
๐ Best for Embeddings
๐ Chutes โ Qwen3-Embedding-8B-TEE, privacy-first
Chutes offers Qwen3-Embedding-8B inside a TEE enclave for maximum embedding privacy. Your text stays encrypted even from the provider. $20/mo PRO plan.
Qwen3-Embedding-8B-TEE
๐ฅ Infermatic โ multilingual-e5-base
Infermatic offers the multilingual-e5-base embedding model (768 dimensions) as part of its $20/mo plan alongside 19 LLM models.
multilingual-e5-base (768d)
๐ Quick Decision Matrix
| Use Case | Best Pick | Runner-Up | Budget Pick |
|---|---|---|---|
| Speed | Groq (212+ tok/s) | StepFun (93 tok/s) | Groq (FREE) |
| Coding | BytePlus (dola-seed-2.0-code) โ | Groq (qwen2.5-coder) | Ollama (qwen3-coder-next) |
| Reasoning | BytePlus (dola-seed-2.0-pro) โ | OpenCode (deepseek-v4) | ORBIT (claude-sonnet-4) |
| Vision/OCR | ZenMux (Gemini 1M ctx) | StepFun (step-3-vl) | OpenCode (glm-5-vision) |
| STT | Groq (Whisper) | StepFun (step-asr) | Groq (FREE) |
| TTS | StepFun (step-tts) | BytePlus (Seed Speech) โ | Infermatic (Kokoro) |
| Image Gen | Chutes (FLUX, Hunyuan) | BytePlus (Seedream-5.0) โ | Featherless (16K+) |
| Video Gen | BytePlus (Seedance) โ | OpenRouter (various) | โ |
| Free Usage | Groq (unlimited) | OpenRouter (33 free) | ZenMux (700/d) |
| Privacy | Chutes (TEE) | Venice (no logs) | โ |
| Uncensored | Venice (75 models) | ArliAI (54 models) | โ |
| Long Context | ZenMux (1M Gemini) | ORBIT (200K Claude) | Groq (128K free) |
| Embeddings | Chutes (Qwen3 TEE) | Infermatic (e5-base) | โ |
| Overall Value | OpenCode ($10/mo) | BytePlus ($9/mo Pro) โ | Groq (FREE) |