โšก

Groq

Speed: โšก 212 tok/s avg
Models: 16
Price: FREE
Status: โœ… Online
Avg Speed
212 tok/s
Models
16
Price
FREE
Best For
Llama, Qwen, Speed

๐Ÿ’ฐ Plan & Pricing

โœ…
Free Tier โ€” Unlimited
14 models completely free, no rate limits on normal usage. LPU hardware delivers unmatched speed. Best free tier available.

๐Ÿ”‘ API Key

gsk_vH...QWrV

๐ŸŒ Endpoint

https://api.groq.com/openai/v1/chat/completions

๐Ÿ“ฆ Models (16 total โ€” all FREE)

ModelSpeedContextCategoryNotes
llama-4-maverickโšก 170 tok/s128KChat๐Ÿ”ฅ Llama 4 Maverick
meta-llama/llama-4-scout-17b-16e-instructโšก 225 tok/s128KVision๐Ÿ”ฅ Llama 4 Scout Vision
openai/gpt-oss-120bโšก 215 tok/s128KChat๐Ÿ”ฅ GPT-OSS
groq/compoundโšก 289 tok/s128KAgentic๐Ÿ”ฅ Compound agentic
qwen/qwen3-32bโšก 340 tok/s128KChat/Coding๐Ÿ”ฅ Qwen3 fast
llama-3.3-70b-versatileโšก 212 tok/s128KChat๐Ÿ”ฅ Best free chat
qwen2.5-coder-32bโšก 212 tok/s128KCodingQwen2.5 Coder
deepseek-r1-distill-llama-70bโšก 180 tok/s128KReasoningDeepSeek R1
llama-3.1-8b-instantโšก 300+ tok/s128KChatUltra fast small model
mixtral-8x7b-32768โšก 190 tok/s32KChatMixtral MoE
gemma2-9b-itโšก 250 tok/s8KChatGemma 2
meta-llama/llama-3.2-90b-visionโšก 140 tok/s128KVisionLlama 3.2 Vision
llama-guard-3-8bโ€”โ€”SafetyContent safety classifier
whisper-large-v3โ€”โ€”STT๐Ÿฅ‡ Best free STT
whisper-large-v3-turboโ€”โ€”STTFaster Whisper
distil-whisper-large-v3-enโ€”โ€”STT distilled, English-only

๐Ÿ’ป cURL Example

curl -X POST https://api.groq.com/openai/v1/chat/completions \
  -H "Authorization: Bearer gsk_vH...QWrV" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "llama-3.3-70b-versatile",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

๐Ÿ Python Example

from openai import OpenAI

client = OpenAI(
    api_key="gsk_vH...QWrV",
    base_url="https://api.groq.com/openai/v1"
)

response = client.chat.completions.create(
    model="llama-3.3-70b-versatile",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

โš ๏ธ Pitfalls & Notes

๐Ÿ’ก
LPU Hardware Advantage โ€” Groq uses custom LPU (Language Processing Unit) chips that deliver 2-4ร— faster inference than GPU-based providers, especially for Llama and Qwen models.
๐Ÿ’ก
OpenAI Compatible โ€” Uses standard OpenAI API format. Any OpenAI SDK works directly.
โš ๏ธ
Rate Limits โ€” Free tier has per-minute request limits but no total token cap. Rarely hit limits on normal usage.

๐Ÿท๏ธ Categories

Chat Coding Reasoning Vision STT