๐ฎ Infermatic
Flat-rate inference โ $20/mo unlimited, includes TTS & embeddings
Infermatic
Avg Speed
38 tok/s
Models
19
Price
$20/mo flat
Best For
TTS & Embeddings
๐ฐ Plan & Pricing
$20/Month Flat Rate
Unlimited requests. Includes TTS (Kokoro-82M) and embeddings (multilingual-e5-base) โ no surprise bills.
Unlimited requests. Includes TTS (Kokoro-82M) and embeddings (multilingual-e5-base) โ no surprise bills.
๐ API Key
๐ Endpoint
https://api.totalgpt.ai/v1/chat/completions
๐ฆ Models (19 total โ Chat, Coding, TTS, Embeddings)
| Model | Speed | Category | Notes |
|---|---|---|---|
| gemini-2.5-flash | โก 48 tok/s | Chat | Fast Gemini |
| gpt-4.1-mini | โก 52 tok/s | Chat | GPT-4.1 Mini โ fast & cheap |
| gpt-4.1 | โก 40 tok/s | Chat | GPT-4.1 |
| gpt-4o | โก 38 tok/s | Chat | GPT-4o |
| claude-sonnet-4-5 | โก 38 tok/s | Chat | Claude Sonnet 4.5 |
| claude-3.5-sonnet | โก 36 tok/s | Chat | Claude 3.5 Sonnet |
| deepseek-r1 | โก 35 tok/s | Reasoning | DeepSeek R1 |
| qwen3-235b | โก 34 tok/s | Chat/Coding | Qwen3 235B |
| deepseek-v3 | โก 37 tok/s | Chat | DeepSeek V3 |
| llama-4-maverick | โก 36 tok/s | Chat | Llama 4 Maverick |
| deepseek-v4 | โก 33 tok/s | Chat | DeepSeek V4 |
| llama-3.3-70b-versatile | โก 35 tok/s | Chat | Llama 3.3 70B |
| mixtral-8x7b-32768 | โก 30 tok/s | Chat | Mixtral MoE |
| gemma-3-27b | โก 32 tok/s | Chat | Gemma 3 27B |
| qwen3-32b | โก 38 tok/s | Chat | Qwen3 32B |
| glm-5 | โก 30 tok/s | Chat | GLM-5 |
| moonshot-vision | โก 25 tok/s | Vision | Vision model |
| kokoro-82m | ๐ N/A | TTS | ๐ Kokoro text-to-speech |
| chat-tts | ๐ N/A | TTS | ๐ ChatTTS voice synthesis |
| multilingual-e5-base | ๐ N/A | Embedding | ๐ Multilingual embeddings |
๐ป cURL Example
curl -X POST https://api.totalgpt.ai/v1/chat/completions \
-H "Authorization: Bearer sk-TZv...Q5Jp" \
-H "Content-Type: application/json" \
-d '{
"model": "claude-sonnet-4-5",
"messages": [{"role": "user", "content": "Hello!"}]
}'
๐ Python Example
from openai import OpenAI
client = OpenAI(
api_key="sk-TZv...Q5Jp",
base_url="https://api.totalgpt.ai/v1"
)
response = client.chat.completions.create(
model="claude-sonnet-4-5",
messages=[{"role": "user", "content": "Hello!"}]
)
print(response.choices[0].message.content)
โ ๏ธ Pitfalls & Notes
Includes TTS (Kokoro-82M) โ Rare among providers. Text-to-speech available at no extra cost.
Includes Embeddings (multilingual-e5-base) โ Embedding models included in the flat rate, great for RAG pipelines.
Flat Rate = No Surprise Bills โ $20/mo unlimited requests means predictable costs regardless of usage.
๐ท๏ธ Categories
Chat
Coding
TTS
Embeddings