โก Groq
Blazing fast LPU inference โ 212 tok/s average, completely free
Groq
Avg Speed
212 tok/s
Models
16
Price
FREE
Best For
Llama, Qwen, Speed
๐ฐ Plan & Pricing
Free Tier โ Unlimited
14 models completely free, no rate limits on normal usage. LPU hardware delivers unmatched speed. Best free tier available.
14 models completely free, no rate limits on normal usage. LPU hardware delivers unmatched speed. Best free tier available.
๐ API Key
๐ Endpoint
https://api.groq.com/openai/v1/chat/completions
๐ฆ Models (16 total โ all FREE)
| Model | Speed | Context | Category | Notes |
|---|---|---|---|---|
| llama-4-maverick | โก 170 tok/s | 128K | Chat | ๐ฅ Llama 4 Maverick |
| meta-llama/llama-4-scout-17b-16e-instruct | โก 225 tok/s | 128K | Vision | ๐ฅ Llama 4 Scout Vision |
| openai/gpt-oss-120b | โก 215 tok/s | 128K | Chat | ๐ฅ GPT-OSS |
| groq/compound | โก 289 tok/s | 128K | Agentic | ๐ฅ Compound agentic |
| qwen/qwen3-32b | โก 340 tok/s | 128K | Chat/Coding | ๐ฅ Qwen3 fast |
| llama-3.3-70b-versatile | โก 212 tok/s | 128K | Chat | ๐ฅ Best free chat |
| qwen2.5-coder-32b | โก 212 tok/s | 128K | Coding | Qwen2.5 Coder |
| deepseek-r1-distill-llama-70b | โก 180 tok/s | 128K | Reasoning | DeepSeek R1 |
| llama-3.1-8b-instant | โก 300+ tok/s | 128K | Chat | Ultra fast small model |
| mixtral-8x7b-32768 | โก 190 tok/s | 32K | Chat | Mixtral MoE |
| gemma2-9b-it | โก 250 tok/s | 8K | Chat | Gemma 2 |
| meta-llama/llama-3.2-90b-vision | โก 140 tok/s | 128K | Vision | Llama 3.2 Vision |
| llama-guard-3-8b | โ | โ | Safety | Content safety classifier |
| whisper-large-v3 | โ | โ | STT | ๐ฅ Best free STT |
| whisper-large-v3-turbo | โ | โ | STT | Faster Whisper |
| distil-whisper-large-v3-en | โ | โ | STT | distilled, English-only |
๐ป cURL Example
curl -X POST https://api.groq.com/openai/v1/chat/completions \
-H "Authorization: Bearer gsk_vH...QWrV" \
-H "Content-Type: application/json" \
-d '{
"model": "llama-3.3-70b-versatile",
"messages": [{"role": "user", "content": "Hello!"}]
}'
๐ Python Example
from openai import OpenAI
client = OpenAI(
api_key="gsk_vH...QWrV",
base_url="https://api.groq.com/openai/v1"
)
response = client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
โ ๏ธ Pitfalls & Notes
LPU Hardware Advantage โ Groq uses custom LPU (Language Processing Unit) chips that deliver 2-4ร faster inference than GPU-based providers, especially for Llama and Qwen models.
OpenAI Compatible โ Uses standard OpenAI API format. Any OpenAI SDK works directly.
Rate Limits โ Free tier has per-minute request limits but no total token cap. Rarely hit limits on normal usage.
๐ท๏ธ Categories
Chat
Coding
Reasoning
Vision
STT