Infermatic

💰 Plan & Pricing

ℹ️

$20/Month Flat Rate
Unlimited requests. Includes TTS (Kokoro-82M) and embeddings (multilingual-e5-base) — no surprise bills.

🔑 API Key

sk-TZv...Q5Jp

🌐 Endpoint

https://api.totalgpt.ai/v1/chat/completions

📦 Models (19 total — Chat, Coding, TTS, Embeddings)

Model	Speed	Category	Notes
gemini-2.5-flash	⚡ 48 tok/s	Chat	Fast Gemini
gpt-4.1-mini	⚡ 52 tok/s	Chat	GPT-4.1 Mini — fast & cheap
gpt-4.1	⚡ 40 tok/s	Chat	GPT-4.1
gpt-4o	⚡ 38 tok/s	Chat	GPT-4o
claude-sonnet-4-5	⚡ 38 tok/s	Chat	Claude Sonnet 4.5
claude-3.5-sonnet	⚡ 36 tok/s	Chat	Claude 3.5 Sonnet
deepseek-r1	⚡ 35 tok/s	Reasoning	DeepSeek R1
qwen3-235b	⚡ 34 tok/s	Chat/Coding	Qwen3 235B
deepseek-v3	⚡ 37 tok/s	Chat	DeepSeek V3
llama-4-maverick	⚡ 36 tok/s	Chat	Llama 4 Maverick
deepseek-v4	⚡ 33 tok/s	Chat	DeepSeek V4
llama-3.3-70b-versatile	⚡ 35 tok/s	Chat	Llama 3.3 70B
mixtral-8x7b-32768	⚡ 30 tok/s	Chat	Mixtral MoE
gemma-3-27b	⚡ 32 tok/s	Chat	Gemma 3 27B
qwen3-32b	⚡ 38 tok/s	Chat	Qwen3 32B
glm-5	⚡ 30 tok/s	Chat	GLM-5
moonshot-vision	⚡ 25 tok/s	Vision	Vision model
kokoro-82m	🔊 N/A	TTS	🔊 Kokoro text-to-speech
chat-tts	🔊 N/A	TTS	🔊 ChatTTS voice synthesis
multilingual-e5-base	📐 N/A	Embedding	📐 Multilingual embeddings

💻 cURL Example

curl -X POST https://api.totalgpt.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-TZv...Q5Jp" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-sonnet-4-5",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

🐍 Python Example

from openai import OpenAI

client = OpenAI(
    api_key="sk-TZv...Q5Jp",
    base_url="https://api.totalgpt.ai/v1"
)

response = client.chat.completions.create(
    model="claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)

⚠️ Pitfalls & Notes

ℹ️

Includes TTS (Kokoro-82M) — Rare among providers. Text-to-speech available at no extra cost.

ℹ️

Includes Embeddings (multilingual-e5-base) — Embedding models included in the flat rate, great for RAG pipelines.

ℹ️

Flat Rate = No Surprise Bills — $20/mo unlimited requests means predictable costs regardless of usage.

🏷️ Categories

Chat Coding TTS Embeddings

🔮 Infermatic