
TL;DR
| Model | Vendor | Endpoints |
|---|---|---|
gpt-5.5 | OpenAI | /v1/chat/completions, /v1/responses |
claude-opus-4-7 | Anthropic | /v1/messages, /v1/chat/completions |
deepseek-v4-pro / deepseek-v4-flash | DeepSeek | /v1/chat/completions |
Existing code does not need to change — just swap the model field. Exact model IDs, capability comparisons, and live pricing are on the Models page and the Pricing page.
Why all three at once
All three vendors shipped new releases in April. Rather than three separate posts, we did one unified onboarding + verification pass:
- One read covers all three vendors
- All three share Tokensmart's same plumbing — auth, billing, logs, caching — so the experience is consistent regardless of vendor
- Our multi-gateway router automatically selects the right upstream channel based on the model name; you never need to think about routing
GPT-5.5
OpenAI's GPT-5.5 emphasises longer context + stronger reasoning, particularly helpful for complex code and multi-step workflows.
Same call pattern as before:
from openai import OpenAI
client = OpenAI(
api_key="pk_live_...",
base_url="https://api.tokensmart.ai/v1",
)
resp = client.chat.completions.create(
model="gpt-5.5", # change just this line
messages=[{"role": "user", "content": "Explain two-phase commit in distributed transactions"}],
)
print(resp.choices[0].message.content)
If you were already using gpt-5, swap the model from "gpt-5" to "gpt-5.5" and you are done — no SDK upgrade, no endpoint switch.
Claude Opus 4.7
Anthropic shipped the Opus 4.7 flagship first; Sonnet / Haiku 4.7 are not out upstream yet, and we will roll them in as soon as Anthropic releases them. Opus 4.7 keeps the 1M-token long context from 4.6 while improving on code generation, tool-use accuracy, and complex instruction following.
The older claude-sonnet-4-6 and claude-opus-4-6 remain available — if you want to stay on the older models, you do not need to change anything.
With the native Anthropic SDK:
import anthropic
client = anthropic.Anthropic(
api_key="pk_live_...",
base_url="https://api.tokensmart.ai",
)
msg = client.messages.create(
model="claude-opus-4-7",
max_tokens=2048,
messages=[{"role": "user", "content": "Hello!"}],
)
With the OpenAI SDK (our /v1/chat/completions endpoint also accepts Claude):
resp = client.chat.completions.create(
model="claude-opus-4-7",
messages=[{"role": "user", "content": "Hello!"}],
)
Prompt Caching is automatic — long system prompts hit Anthropic's cache_control automatically, saving up to 90% on input tokens after the first call. No configuration needed.
DeepSeek V4
The latest DeepSeek V4 ships in two tiers:
deepseek-v4-pro— flagship, best general capability, ideal for code, math reasoning, and complex instruction followingdeepseek-v4-flash— economy tier, faster responses and lower per-token price, ideal for high-concurrency chat and simpler tasks
Both undercut overseas flagships by roughly an order of magnitude — see the Pricing page for current rates.
resp = client.chat.completions.create(
model="deepseek-v4-pro", # flagship
# or model="deepseek-v4-flash" # fast & economical
messages=[{"role": "user", "content": "Write a quicksort in Python"}],
)
DeepSeek goes through the same unified OpenAI-compatible endpoint — calling it looks identical to calling GPT.
Verifying all three are live
Fastest sanity check:
# List models available on your account
curl https://api.tokensmart.ai/v1/models \
-H "Authorization: Bearer pk_live_..."
If data[].id includes gpt-5.5, claude-opus-4-7, deepseek-v4-pro, deepseek-v4-flash — you are good.
Billing, logs, caching — all unified
- All three vendors are charged by actual token usage, identical to your prior model calls
- Every call appears on the API Logs page with model name, token counts, latency, and per-call cost
- Anthropic prompt cache and OpenAI prompt cache both pass
cache_read_tokensthrough transparently — no double billing - Balance view unchanged, available at the top of the Dashboard
Pricing policy
We continue our transparent pricing policy:
- Vendor official prices are public on the Pricing page
- Tokensmart's effective price = official × per-model discount multiplier
- What you see is what you pay — no hidden margin, no tiered gates, no minimum top-up
What's next
Coming up:
- Feasibility assessment for video models (Sora / Runway etc.)
- Multimodal chat (image + text input) end-to-end
- More Chinese models (Zhipu GLM-5, latest Moonshot)
If you have specific use cases or vendors you want prioritised, let us know in the enterprise WeChat group or at support@tokensmart.ai.