โšก Best for Speed

๐Ÿ† Groq โ€” 212 tok/s avg
Groq's LPU hardware delivers unmatched inference speed. Llama-3.3-70b hits up to 300+ tok/s. Best choice when latency matters more than model diversity.
Top models: llama-3.3-70b-versatile, llama-3.1-8b, deepseek-r1, qwen/qwen3-32b (295 tok/s), groq/compound (289 tok/s)
๐Ÿฅˆ StepFun โ€” 93 tok/s avg
Step models on StepFun's own infrastructure are fast. Best for Chinese/English bilingual tasks.
Top models: step-3-chat, step-3-reasoning
๐Ÿฅ‰ BytePlus ModelArk โ€” 84 tok/s max โœ…
dola-seed-2.0-lite hits 84 tok/s for coding tasks. Best coding speed per dollar. $9/mo Pro Plan gives access to 9 verified models plus General API.
Top models: dola-seed-2.0-lite (84 tok/s), dola-seed-2.0-code (80 tok/s)

๐Ÿ’ป Best for Coding

๐Ÿ† BytePlus ModelArk โ€” dola-seed-2.0-code @ 80 tok/s โœ… Verified & Working
BytePlus's coding-specific models are purpose-built for code. The dola-seed-2.0-lite hits 84 tok/s, making it both fast and accurate. The $9/mo Pro Plan is excellent value for coding โ€” includes ark-code-latest (auto agent), bytedance-seed-code, and the full General API.
dola-seed-2.0-code (80 tok/s), dola-seed-2.0-lite (84 tok/s), bytedance-seed-code (63 tok/s), ark-code-latest (auto agent)
๐Ÿฅˆ Groq โ€” deepseek-r1 / qwen2.5-coder @ up to 212 tok/s
Groq's LPU acceleration makes code generation blazing fast. Free tier is unbeatable for coding throughput.
deepseek-r1, qwen2.5-coder-32b, llama-3.3-70b
๐Ÿฅ‰ Ollama Cloud โ€” Exclusive coding models
Ollama offers exclusive coding models like qwen3-coder-next and devstral-2:123b. Unlimited usage on the flat plan.
qwen3-coder-next (52 tok/s), devstral-2:123b (36 tok/s)
๐Ÿ… Chutes TEE โ€” Qwen Coder with privacy
For sensitive codebases, Chutes offers TEE-enclave execution. Your code stays private even from the provider.
qwen2.5-coder-32b-tee, deepseek-v3.2-tee

๐Ÿง  Best for Reasoning

๐Ÿ† BytePlus ModelArk โ€” dola-seed-2.0-pro @ 50 tok/s โœ… Verified & Working
The dola-seed-2.0-pro model is purpose-built for deep reasoning tasks. Shows strong performance on math, logic, and multi-step reasoning benchmarks. $9/mo Pro Plan.
dola-seed-2.0-pro (50 tok/s), plus DeepSeek-V3.2 reasoning via General API
๐Ÿฅˆ OpenCode โ€” DeepSeek-V4-Pro @ 52 tok/s
OpenCode offers DeepSeek-V4-Pro at $10/mo flat โ€” the best DeepSeek value. Also offers qwen3-235b reasoning and GLM-5.
deepseek-v4-pro, qwen3-235b, glm-5
๐Ÿฅ‰ ORBIT โ€” Claude Opus/Sonnet for free
ORBIT gives you 2B tokens/month of Claude reasoning for free. Only caveat: only Claude models work reliably.
claude-opus-4-6, claude-sonnet-4-5

๐Ÿ‘๏ธ Best for Vision / OCR

๐Ÿ† ZenMux โ€” Best Gemini coverage
ZenMux provides the best Gemini 2.5 Pro/Flash access with vision capabilities. 700 free requests/day. Gemini's 1M context window is unmatched for vision tasks.
gemini-2.5-pro (41 tok/s), gemini-2.5-flash (90 tok/s)
๐Ÿฅˆ StepFun โ€” step-3-vl @ 93 tok/s
StepFun's step-3-vl is specifically trained for Chinese/English OCR and vision tasks. Fast and accurate at just $9/mo.
step-3-vl
๐Ÿฅ‰ OpenCode โ€” GLM-5-vision + MiniMax-vl
Two vision models for $10/mo flat. Good value with both GLM-5-vision and minimax-vl-01 included.
glm-5-vision, minimax-vl-01
๐Ÿ… Ollama Cloud โ€”Exclusive Gemini-3-flash-preview
Ollama offers gemini-3-flash-preview as an exclusive model, plus unlimited usage makes it great for bulk vision tasks.
gemini-3-flash-preview (47 tok/s)

๐ŸŽค Best for TTS / STT

๐Ÿ† Groq โ€” Whisper (STT) โ€” FREE
Groq offers Whisper-large-v3 and Whisper-large-v3-turbo for speech-to-text completely free. LPU acceleration makes it the fastest STT option available. Two STT models at no cost.
whisper-large-v3, whisper-large-v3-turbo
๐Ÿฅˆ StepFun โ€” STT + TTS in one plan
StepFun uniquely offers both step-asr (STT) and step-tts (TTS) under one $9/mo plan. Best value if you need both speech input and output.
step-asr (STT), step-tts (TTS)
๐Ÿฅ‰ BytePlus โ€” Seed Speech (TTS) โœ… Verified & Working
BytePlus offers Seed Speech TTS and step-asr via the $9/mo Pro Plan. Also includes Seedream image gen, Seedance video gen, and OmniHuman digital human on the same plan.
Seed Speech (TTS), plus video gen (Seedance) and digital human (OmniHuman)
๐Ÿ… Infermatic โ€” Kokoro TTS
Infermatic offers Kokoro-82M TTS engine as part of its $20/mo plan, alongside 19 LLM models.
kokoro-82m

๐Ÿ–ผ๏ธ Best for Image Generation

๐Ÿ† Chutes โ€” FLUX, Hunyuan, JuggernautXL, DreamShaper
Chutes provides dedicated image generation endpoints. Working models include FLUX.1-schnell (fast), JuggernautXL (cinematic), DreamShaper-XL (artistic), and hunyuan-image-3 (photorealistic). $20/mo PRO plan.
FLUX.1-schnell, JuggernautXL-Ragnarok, DreamShaper-XL, hunyuan-image-3
๐Ÿฅˆ BytePlus โ€” Seedream-5.0-lite (4K) โœ… Verified & Working
BytePlus offers Seedream-5.0-lite for 4K image generation as part of the $9/mo Pro Plan. Also includes Seedance video gen, making it the best all-in-one creative plan.
Seedream-5.0-lite (4K image gen)
๐Ÿฅ‰ Venice โ€” Uncensored image gen
Venice offers uncensored image generation. No content filters on prompts or outputs. Ideal for creative freedom.
flux-1, various uncensored models
๐Ÿ… ArliAI โ€” Image gen API
ArliAI offers image generation API alongside their 54 derestricted LLM models. From $10/mo.
Various diffusion models

๐Ÿ”’ Best for Privacy

๐Ÿ† Chutes โ€” TEE (Trusted Execution Environment)
Chutes offers TEE-enclave execution for select models. Your prompts and responses are encrypted in transit and processed inside a hardware-based trusted execution environment. Even Chutes cannot read your data. 20 TEE-enabled models including Qwen, GLM, Kimi, DeepSeek variants.
qwen3-32b-tee, deepseek-v3.2-tee, kimi-k2.5-tee, glm-5-tee, qwen3-coder-32b-tee, glm-5.1-tee
๐Ÿฅˆ Venice โ€” No content filters, privacy-first
Venice doesn't log prompts or filter content. No data retention, no surveillance. Best for uncensored workflow. All 75 models have zero logging.
All 75 Venice models

๐Ÿ”“ Best for Uncensored

๐Ÿ† Venice โ€” Zero content filters
Venice is built from the ground up for uncensored inference. All 75 models have no content safety filters. Claude, Kimi, Grok โ€” all derestricted. No prompt logging, no data retention. Includes grok-4 at 102 tok/s (exclusive).
All 75 models, including claude-opus-4-6, claude-sonnet-4-5, kimi-k2.5, kimi-k2.6, grok-4, deepseek-v4-pro
๐Ÿฅˆ ArliAI โ€” Derestricted models
ArliAI offers derestricted versions of popular models. 54 models with content filters removed. From $10/mo.
Various derestricted chat and reasoning models

๐Ÿ†“ Best for Free

๐Ÿ† Groq โ€” Completely free, unlimited
Groq offers 16 models completely free with no rate limits on normal usage. LPU hardware means 212+ tok/s average speed. Best free tier in the market by far. Includes both Whisper STT models.
llama-3.3-70b-versatile, openai/gpt-oss-120b, groq/compound, llama-3.1-8b-instant, qwen/qwen3-32b, deepseek-r1-distill-llama-70b, mixtral-8x7b-32768, gemma2-9b-it, whisper-large-v3, whisper-large-v3-turbo
๐Ÿฅˆ Ollama Cloud โ€” Unlimited models (38+)
Ollama offers 38+ models including exclusive options like nemotron-3-super, minimax-m2.1, and qwen3-coder-next. Flat rate unlimited plan. Slowest models are still usable.
gpt-oss:120b, glm-5.1, deepseek-v4-flash, kimi-k2.5, nemotron-3-super (exclusive), qwen3-coder-next (exclusive)
๐Ÿฅ‰ OpenRouter โ€” 33 free models
OpenRouter provides 33 free models (suffix :free). Rate limited to 3-5 req/min per model, but huge variety including Claude, GPT-4, Gemini, and more.
anthropic/claude-3.5-sonnet:free, google/gemini-flash:free, meta-llama/llama-3.1-8b:free
๐Ÿ… ORBIT โ€” 2B tokens/mo free, Claude only
2 billion tokens per month of Claude access for free. Massive allocation but limited to Claude models.
claude-opus-4-6, claude-sonnet-4-5, claude-3.5-sonnet, claude-3-opus
๐Ÿ… ZenMux โ€” 700 requests/day free
700 free requests per day with good Gemini coverage. Best for moderate usage of Gemini models.
gemini-2.5-pro, gemini-2.5-flash

๐Ÿ’ฐ Best Overall Value

๐Ÿ† OpenCode โ€” $10/mo flat, 15 models, 52 tok/s
OpenCode gives you access to GLM-5, DeepSeek-V4, MiniMax, and more โ€” all for a flat $10/month. No usage caps, no rate limits. Best per-dollar value by far.
glm-5, deepseek-v4-flash, deepseek-v4-pro, minimax-vl-01, kimi-k2.5, qwen3-235b, step-3-chat, glm-5-vision
๐Ÿฅˆ BytePlus โ€” $9/mo Pro Plan โœ… Verified & Working
For $9/mo, BytePlus gives you 9 verified models plus General API access to Doubao, DeepSeek-V3.2, Seedream image gen (4K), Seedance video gen, Seed Speech TTS, and OmniHuman digital human. The best all-in-one creative plan.
dola-seed-2.0-lite (84 tok/s), dola-seed-2.0-code (80 tok/s), dola-seed-2.0-pro (50 tok/s), Seedream-5.0-lite, Seedance-2.0, Seed Speech
๐Ÿฅ‰ Groq โ€” Free unlimited
If you don't need premium models, Groq's free tier with 16 models at 212+ tok/s is unbeatable for cost-conscious usage.
llama-3.3-70b-versatile, deepseek-r1, whisper-large-v3

๐Ÿ“ Best for Long Context

๐Ÿ† ZenMux โ€” Gemini 1M context
Gemini 2.5 Pro/Flash on ZenMux offer up to 1M token context windows โ€” the largest available. 700 free requests/day. Perfect for document analysis, long-form content, and large codebases.
gemini-2.5-pro (1M context), gemini-2.5-flash (1M context)
๐Ÿฅˆ ORBIT โ€” Claude 200K context, free
Claude models on ORBIT support 200K token context. Free with 2B tokens/month. Best free long-context option.
claude-opus-4-6, claude-sonnet-4-5 (all 200K context)
๐Ÿฅ‰ Groq โ€” 128K context, free
Most Groq models support 128K context. Free and fast โ€” best for moderate-length documents.
llama-3.3-70b-versatile (128K), openai/gpt-oss-120b (128K), qwen/qwen3-32b (128K)
๐Ÿ… BytePlus โ€” Doubao 128K context โœ… Verified & Working
BytePlus General API includes Doubao-Pro-128K and Doubao-Lite-128K for long-context tasks. $9/mo Pro Plan.
Doubao-Pro-128K, Doubao-Lite-128K

๐Ÿ“ Best for Embeddings

๐Ÿ† Chutes โ€” Qwen3-Embedding-8B-TEE, privacy-first
Chutes offers Qwen3-Embedding-8B inside a TEE enclave for maximum embedding privacy. Your text stays encrypted even from the provider. $20/mo PRO plan.
Qwen3-Embedding-8B-TEE
๐Ÿฅˆ Infermatic โ€” multilingual-e5-base
Infermatic offers the multilingual-e5-base embedding model (768 dimensions) as part of its $20/mo plan alongside 19 LLM models.
multilingual-e5-base (768d)

๐Ÿ“Š Quick Decision Matrix

Use CaseBest PickRunner-UpBudget Pick
SpeedGroq (212+ tok/s)StepFun (93 tok/s)Groq (FREE)
CodingBytePlus (dola-seed-2.0-code) โœ…Groq (qwen2.5-coder)Ollama (qwen3-coder-next)
ReasoningBytePlus (dola-seed-2.0-pro) โœ…OpenCode (deepseek-v4)ORBIT (claude-sonnet-4)
Vision/OCRZenMux (Gemini 1M ctx)StepFun (step-3-vl)OpenCode (glm-5-vision)
STTGroq (Whisper)StepFun (step-asr)Groq (FREE)
TTSStepFun (step-tts)BytePlus (Seed Speech) โœ…Infermatic (Kokoro)
Image GenChutes (FLUX, Hunyuan)BytePlus (Seedream-5.0) โœ…Featherless (16K+)
Video GenBytePlus (Seedance) โœ…OpenRouter (various)โ€”
Free UsageGroq (unlimited)OpenRouter (33 free)ZenMux (700/d)
PrivacyChutes (TEE)Venice (no logs)โ€”
UncensoredVenice (75 models)ArliAI (54 models)โ€”
Long ContextZenMux (1M Gemini)ORBIT (200K Claude)Groq (128K free)
EmbeddingsChutes (Qwen3 TEE)Infermatic (e5-base)โ€”
Overall ValueOpenCode ($10/mo)BytePlus ($9/mo Pro) โœ…Groq (FREE)