Groq

💰 Plan & Pricing

✅

Free Tier — Unlimited
14 models completely free, no rate limits on normal usage. LPU hardware delivers unmatched speed. Best free tier available.

🔑 API Key

gsk_vH...QWrV

🌐 Endpoint

https://api.groq.com/openai/v1/chat/completions

📦 Models (16 total — all FREE)

Model	Speed	Context	Category	Notes
llama-4-maverick	⚡ 170 tok/s	128K	Chat	🔥 Llama 4 Maverick
meta-llama/llama-4-scout-17b-16e-instruct	⚡ 225 tok/s	128K	Vision	🔥 Llama 4 Scout Vision
openai/gpt-oss-120b	⚡ 215 tok/s	128K	Chat	🔥 GPT-OSS
groq/compound	⚡ 289 tok/s	128K	Agentic	🔥 Compound agentic
qwen/qwen3-32b	⚡ 340 tok/s	128K	Chat/Coding	🔥 Qwen3 fast
llama-3.3-70b-versatile	⚡ 212 tok/s	128K	Chat	🔥 Best free chat
qwen2.5-coder-32b	⚡ 212 tok/s	128K	Coding	Qwen2.5 Coder
deepseek-r1-distill-llama-70b	⚡ 180 tok/s	128K	Reasoning	DeepSeek R1
llama-3.1-8b-instant	⚡ 300+ tok/s	128K	Chat	Ultra fast small model
mixtral-8x7b-32768	⚡ 190 tok/s	32K	Chat	Mixtral MoE
gemma2-9b-it	⚡ 250 tok/s	8K	Chat	Gemma 2
meta-llama/llama-3.2-90b-vision	⚡ 140 tok/s	128K	Vision	Llama 3.2 Vision
llama-guard-3-8b	—	—	Safety	Content safety classifier
whisper-large-v3	—	—	STT	🥇 Best free STT
whisper-large-v3-turbo	—	—	STT	Faster Whisper
distil-whisper-large-v3-en	—	—	STT	distilled, English-only

💻 cURL Example

curl -X POST https://api.groq.com/openai/v1/chat/completions \
  -H "Authorization: Bearer gsk_vH...QWrV" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-versatile",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

🐍 Python Example

from openai import OpenAI

client = OpenAI(
    api_key="gsk_vH...QWrV",
    base_url="https://api.groq.com/openai/v1"
)

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

⚠️ Pitfalls & Notes

💡

LPU Hardware Advantage — Groq uses custom LPU (Language Processing Unit) chips that deliver 2-4× faster inference than GPU-based providers, especially for Llama and Qwen models.

💡

OpenAI Compatible — Uses standard OpenAI API format. Any OpenAI SDK works directly.

⚠️

Rate Limits — Free tier has per-minute request limits but no total token cap. Rarely hit limits on normal usage.

🏷️ Categories

Chat Coding Reasoning Vision STT

⚡ Groq