Best For — API Provider Hub

⚡ Best for Speed

🏆 Groq — 212 tok/s avg

Groq's LPU hardware delivers unmatched inference speed. Llama-3.3-70b hits up to 300+ tok/s. Best choice when latency matters more than model diversity.

Top models: llama-3.3-70b-versatile, llama-3.1-8b, deepseek-r1, qwen/qwen3-32b (295 tok/s), groq/compound (289 tok/s)

🥈 StepFun — 93 tok/s avg

Step models on StepFun's own infrastructure are fast. Best for Chinese/English bilingual tasks.

Top models: step-3-chat, step-3-reasoning

🥉 BytePlus ModelArk — 84 tok/s max ✅

dola-seed-2.0-lite hits 84 tok/s for coding tasks. Best coding speed per dollar. $9/mo Pro Plan gives access to 9 verified models plus General API.

Top models: dola-seed-2.0-lite (84 tok/s), dola-seed-2.0-code (80 tok/s)

💻 Best for Coding

🏆 BytePlus ModelArk — dola-seed-2.0-code @ 80 tok/s ✅ Verified & Working

BytePlus's coding-specific models are purpose-built for code. The dola-seed-2.0-lite hits 84 tok/s, making it both fast and accurate. The $9/mo Pro Plan is excellent value for coding — includes ark-code-latest (auto agent), bytedance-seed-code, and the full General API.

dola-seed-2.0-code (80 tok/s), dola-seed-2.0-lite (84 tok/s), bytedance-seed-code (63 tok/s), ark-code-latest (auto agent)

🥈 Groq — deepseek-r1 / qwen2.5-coder @ up to 212 tok/s

Groq's LPU acceleration makes code generation blazing fast. Free tier is unbeatable for coding throughput.

deepseek-r1, qwen2.5-coder-32b, llama-3.3-70b

🥉 Ollama Cloud — Exclusive coding models

Ollama offers exclusive coding models like qwen3-coder-next and devstral-2:123b. Unlimited usage on the flat plan.

qwen3-coder-next (52 tok/s), devstral-2:123b (36 tok/s)

🏅 Chutes TEE — Qwen Coder with privacy

For sensitive codebases, Chutes offers TEE-enclave execution. Your code stays private even from the provider.

qwen2.5-coder-32b-tee, deepseek-v3.2-tee

🧠 Best for Reasoning

🏆 BytePlus ModelArk — dola-seed-2.0-pro @ 50 tok/s ✅ Verified & Working

The dola-seed-2.0-pro model is purpose-built for deep reasoning tasks. Shows strong performance on math, logic, and multi-step reasoning benchmarks. $9/mo Pro Plan.

dola-seed-2.0-pro (50 tok/s), plus DeepSeek-V3.2 reasoning via General API

🥈 OpenCode — DeepSeek-V4-Pro @ 52 tok/s

OpenCode offers DeepSeek-V4-Pro at $10/mo flat — the best DeepSeek value. Also offers qwen3-235b reasoning and GLM-5.

deepseek-v4-pro, qwen3-235b, glm-5

🥉 ORBIT — Claude Opus/Sonnet for free

ORBIT gives you 2B tokens/month of Claude reasoning for free. Only caveat: only Claude models work reliably.

claude-opus-4-6, claude-sonnet-4-5

👁️ Best for Vision / OCR

🏆 ZenMux — Best Gemini coverage

ZenMux provides the best Gemini 2.5 Pro/Flash access with vision capabilities. 700 free requests/day. Gemini's 1M context window is unmatched for vision tasks.

gemini-2.5-pro (41 tok/s), gemini-2.5-flash (90 tok/s)

🥈 StepFun — step-3-vl @ 93 tok/s

StepFun's step-3-vl is specifically trained for Chinese/English OCR and vision tasks. Fast and accurate at just $9/mo.

step-3-vl

🥉 OpenCode — GLM-5-vision + MiniMax-vl

Two vision models for $10/mo flat. Good value with both GLM-5-vision and minimax-vl-01 included.

glm-5-vision, minimax-vl-01

🏅 Ollama Cloud —Exclusive Gemini-3-flash-preview

Ollama offers gemini-3-flash-preview as an exclusive model, plus unlimited usage makes it great for bulk vision tasks.

gemini-3-flash-preview (47 tok/s)

🎤 Best for TTS / STT

🏆 Groq — Whisper (STT) — FREE

Groq offers Whisper-large-v3 and Whisper-large-v3-turbo for speech-to-text completely free. LPU acceleration makes it the fastest STT option available. Two STT models at no cost.

whisper-large-v3, whisper-large-v3-turbo

🥈 StepFun — STT + TTS in one plan

StepFun uniquely offers both step-asr (STT) and step-tts (TTS) under one $9/mo plan. Best value if you need both speech input and output.

step-asr (STT), step-tts (TTS)

🥉 BytePlus — Seed Speech (TTS) ✅ Verified & Working

BytePlus offers Seed Speech TTS and step-asr via the $9/mo Pro Plan. Also includes Seedream image gen, Seedance video gen, and OmniHuman digital human on the same plan.

Seed Speech (TTS), plus video gen (Seedance) and digital human (OmniHuman)

🏅 Infermatic — Kokoro TTS

Infermatic offers Kokoro-82M TTS engine as part of its $20/mo plan, alongside 19 LLM models.

kokoro-82m

🖼️ Best for Image Generation

🏆 Chutes — FLUX, Hunyuan, JuggernautXL, DreamShaper

Chutes provides dedicated image generation endpoints. Working models include FLUX.1-schnell (fast), JuggernautXL (cinematic), DreamShaper-XL (artistic), and hunyuan-image-3 (photorealistic). $20/mo PRO plan.

FLUX.1-schnell, JuggernautXL-Ragnarok, DreamShaper-XL, hunyuan-image-3

🥈 BytePlus — Seedream-5.0-lite (4K) ✅ Verified & Working

BytePlus offers Seedream-5.0-lite for 4K image generation as part of the $9/mo Pro Plan. Also includes Seedance video gen, making it the best all-in-one creative plan.

Seedream-5.0-lite (4K image gen)

🥉 Venice — Uncensored image gen

Venice offers uncensored image generation. No content filters on prompts or outputs. Ideal for creative freedom.

flux-1, various uncensored models

🏅 ArliAI — Image gen API

ArliAI offers image generation API alongside their 54 derestricted LLM models. From $10/mo.

Various diffusion models

🔒 Best for Privacy

🏆 Chutes — TEE (Trusted Execution Environment)

Chutes offers TEE-enclave execution for select models. Your prompts and responses are encrypted in transit and processed inside a hardware-based trusted execution environment. Even Chutes cannot read your data. 20 TEE-enabled models including Qwen, GLM, Kimi, DeepSeek variants.

qwen3-32b-tee, deepseek-v3.2-tee, kimi-k2.5-tee, glm-5-tee, qwen3-coder-32b-tee, glm-5.1-tee

🥈 Venice — No content filters, privacy-first

Venice doesn't log prompts or filter content. No data retention, no surveillance. Best for uncensored workflow. All 75 models have zero logging.

All 75 Venice models

🔓 Best for Uncensored

🏆 Venice — Zero content filters

Venice is built from the ground up for uncensored inference. All 75 models have no content safety filters. Claude, Kimi, Grok — all derestricted. No prompt logging, no data retention. Includes grok-4 at 102 tok/s (exclusive).

All 75 models, including claude-opus-4-6, claude-sonnet-4-5, kimi-k2.5, kimi-k2.6, grok-4, deepseek-v4-pro

🥈 ArliAI — Derestricted models

ArliAI offers derestricted versions of popular models. 54 models with content filters removed. From $10/mo.

Various derestricted chat and reasoning models

🆓 Best for Free

🏆 Groq — Completely free, unlimited

Groq offers 16 models completely free with no rate limits on normal usage. LPU hardware means 212+ tok/s average speed. Best free tier in the market by far. Includes both Whisper STT models.

llama-3.3-70b-versatile, openai/gpt-oss-120b, groq/compound, llama-3.1-8b-instant, qwen/qwen3-32b, deepseek-r1-distill-llama-70b, mixtral-8x7b-32768, gemma2-9b-it, whisper-large-v3, whisper-large-v3-turbo

🥈 Ollama Cloud — Unlimited models (38+)

Ollama offers 38+ models including exclusive options like nemotron-3-super, minimax-m2.1, and qwen3-coder-next. Flat rate unlimited plan. Slowest models are still usable.

gpt-oss:120b, glm-5.1, deepseek-v4-flash, kimi-k2.5, nemotron-3-super (exclusive), qwen3-coder-next (exclusive)

🥉 OpenRouter — 33 free models

OpenRouter provides 33 free models (suffix :free). Rate limited to 3-5 req/min per model, but huge variety including Claude, GPT-4, Gemini, and more.

anthropic/claude-3.5-sonnet:free, google/gemini-flash:free, meta-llama/llama-3.1-8b:free

🏅 ORBIT — 2B tokens/mo free, Claude only

2 billion tokens per month of Claude access for free. Massive allocation but limited to Claude models.

claude-opus-4-6, claude-sonnet-4-5, claude-3.5-sonnet, claude-3-opus

🏅 ZenMux — 700 requests/day free

700 free requests per day with good Gemini coverage. Best for moderate usage of Gemini models.

gemini-2.5-pro, gemini-2.5-flash

💰 Best Overall Value

🏆 OpenCode — $10/mo flat, 15 models, 52 tok/s

OpenCode gives you access to GLM-5, DeepSeek-V4, MiniMax, and more — all for a flat $10/month. No usage caps, no rate limits. Best per-dollar value by far.

glm-5, deepseek-v4-flash, deepseek-v4-pro, minimax-vl-01, kimi-k2.5, qwen3-235b, step-3-chat, glm-5-vision

🥈 BytePlus — $9/mo Pro Plan ✅ Verified & Working

For $9/mo, BytePlus gives you 9 verified models plus General API access to Doubao, DeepSeek-V3.2, Seedream image gen (4K), Seedance video gen, Seed Speech TTS, and OmniHuman digital human. The best all-in-one creative plan.

dola-seed-2.0-lite (84 tok/s), dola-seed-2.0-code (80 tok/s), dola-seed-2.0-pro (50 tok/s), Seedream-5.0-lite, Seedance-2.0, Seed Speech

🥉 Groq — Free unlimited

If you don't need premium models, Groq's free tier with 16 models at 212+ tok/s is unbeatable for cost-conscious usage.

llama-3.3-70b-versatile, deepseek-r1, whisper-large-v3

📏 Best for Long Context

🏆 ZenMux — Gemini 1M context

Gemini 2.5 Pro/Flash on ZenMux offer up to 1M token context windows — the largest available. 700 free requests/day. Perfect for document analysis, long-form content, and large codebases.

gemini-2.5-pro (1M context), gemini-2.5-flash (1M context)

🥈 ORBIT — Claude 200K context, free

Claude models on ORBIT support 200K token context. Free with 2B tokens/month. Best free long-context option.

claude-opus-4-6, claude-sonnet-4-5 (all 200K context)

🥉 Groq — 128K context, free

Most Groq models support 128K context. Free and fast — best for moderate-length documents.

llama-3.3-70b-versatile (128K), openai/gpt-oss-120b (128K), qwen/qwen3-32b (128K)

🏅 BytePlus — Doubao 128K context ✅ Verified & Working

BytePlus General API includes Doubao-Pro-128K and Doubao-Lite-128K for long-context tasks. $9/mo Pro Plan.

Doubao-Pro-128K, Doubao-Lite-128K

📐 Best for Embeddings

🏆 Chutes — Qwen3-Embedding-8B-TEE, privacy-first

Chutes offers Qwen3-Embedding-8B inside a TEE enclave for maximum embedding privacy. Your text stays encrypted even from the provider. $20/mo PRO plan.

Qwen3-Embedding-8B-TEE

🥈 Infermatic — multilingual-e5-base

Infermatic offers the multilingual-e5-base embedding model (768 dimensions) as part of its $20/mo plan alongside 19 LLM models.

multilingual-e5-base (768d)

📊 Quick Decision Matrix

Use Case	Best Pick	Runner-Up	Budget Pick
Speed	Groq (212+ tok/s)	StepFun (93 tok/s)	Groq (FREE)
Coding	BytePlus (dola-seed-2.0-code) ✅	Groq (qwen2.5-coder)	Ollama (qwen3-coder-next)
Reasoning	BytePlus (dola-seed-2.0-pro) ✅	OpenCode (deepseek-v4)	ORBIT (claude-sonnet-4)
Vision/OCR	ZenMux (Gemini 1M ctx)	StepFun (step-3-vl)	OpenCode (glm-5-vision)
STT	Groq (Whisper)	StepFun (step-asr)	Groq (FREE)
TTS	StepFun (step-tts)	BytePlus (Seed Speech) ✅	Infermatic (Kokoro)
Image Gen	Chutes (FLUX, Hunyuan)	BytePlus (Seedream-5.0) ✅	Featherless (16K+)
Video Gen	BytePlus (Seedance) ✅	OpenRouter (various)	—
Free Usage	Groq (unlimited)	OpenRouter (33 free)	ZenMux (700/d)
Privacy	Chutes (TEE)	Venice (no logs)	—
Uncensored	Venice (75 models)	ArliAI (54 models)	—
Long Context	ZenMux (1M Gemini)	ORBIT (200K Claude)	Groq (128K free)
Embeddings	Chutes (Qwen3 TEE)	Infermatic (e5-base)	—
Overall Value	OpenCode ($10/mo)	BytePlus ($9/mo Pro) ✅	Groq (FREE)

🏆 Best Provider For…