New Models: GPT-5.5, Claude Opus 4.7, and DeepSeek V4 Now Available

GPT-5.5, Claude Opus 4.7, and DeepSeek V4 now available on Tokensmart

TL;DR

Model	Vendor	Endpoints
`gpt-5.5`	OpenAI	`/v1/chat/completions`, `/v1/responses`
`claude-opus-4-7`	Anthropic	`/v1/messages`, `/v1/chat/completions`
`deepseek-v4-pro` / `deepseek-v4-flash`	DeepSeek	`/v1/chat/completions`

Existing code does not need to change — just swap the model field. Exact model IDs, capability comparisons, and live pricing are on the Models page and the Pricing page.

Why all three at once

All three vendors shipped new releases in April. Rather than three separate posts, we did one unified onboarding + verification pass:

One read covers all three vendors
All three share Tokensmart's same plumbing — auth, billing, logs, caching — so the experience is consistent regardless of vendor
Our multi-gateway router automatically selects the right upstream channel based on the model name; you never need to think about routing

GPT-5.5

OpenAI's GPT-5.5 emphasises longer context + stronger reasoning, particularly helpful for complex code and multi-step workflows.

Same call pattern as before:

from openai import OpenAI

client = OpenAI(
    api_key="pk_live_...",
    base_url="https://api.tokensmart.ai/v1",
)

resp = client.chat.completions.create(
    model="gpt-5.5",   # change just this line
    messages=[{"role": "user", "content": "Explain two-phase commit in distributed transactions"}],
)
print(resp.choices[0].message.content)

If you were already using gpt-5, swap the model from "gpt-5" to "gpt-5.5" and you are done — no SDK upgrade, no endpoint switch.

Claude Opus 4.7

Anthropic shipped the Opus 4.7 flagship first; Sonnet / Haiku 4.7 are not out upstream yet, and we will roll them in as soon as Anthropic releases them. Opus 4.7 keeps the 1M-token long context from 4.6 while improving on code generation, tool-use accuracy, and complex instruction following.

The older claude-sonnet-4-6 and claude-opus-4-6 remain available — if you want to stay on the older models, you do not need to change anything.

With the native Anthropic SDK:

import anthropic

client = anthropic.Anthropic(
    api_key="pk_live_...",
    base_url="https://api.tokensmart.ai",
)

msg = client.messages.create(
    model="claude-opus-4-7",
    max_tokens=2048,
    messages=[{"role": "user", "content": "Hello!"}],
)

With the OpenAI SDK (our /v1/chat/completions endpoint also accepts Claude):

resp = client.chat.completions.create(
    model="claude-opus-4-7",
    messages=[{"role": "user", "content": "Hello!"}],
)

Prompt Caching is automatic — long system prompts hit Anthropic's cache_control automatically, saving up to 90% on input tokens after the first call. No configuration needed.

DeepSeek V4

The latest DeepSeek V4 ships in two tiers:

deepseek-v4-pro — flagship, best general capability, ideal for code, math reasoning, and complex instruction following
deepseek-v4-flash — economy tier, faster responses and lower per-token price, ideal for high-concurrency chat and simpler tasks

Both undercut overseas flagships by roughly an order of magnitude — see the Pricing page for current rates.

resp = client.chat.completions.create(
    model="deepseek-v4-pro",         # flagship
    # or model="deepseek-v4-flash"   # fast & economical
    messages=[{"role": "user", "content": "Write a quicksort in Python"}],
)

DeepSeek goes through the same unified OpenAI-compatible endpoint — calling it looks identical to calling GPT.

Verifying all three are live

Fastest sanity check:

# List models available on your account
curl https://api.tokensmart.ai/v1/models \
  -H "Authorization: Bearer pk_live_..."

If data[].id includes gpt-5.5, claude-opus-4-7, deepseek-v4-pro, deepseek-v4-flash — you are good.

Billing, logs, caching — all unified

All three vendors are charged by actual token usage, identical to your prior model calls
Every call appears on the API Logs page with model name, token counts, latency, and per-call cost
Anthropic prompt cache and OpenAI prompt cache both pass cache_read_tokens through transparently — no double billing
Balance view unchanged, available at the top of the Dashboard

Pricing policy

We continue our transparent pricing policy:

Vendor official prices are public on the Pricing page
Tokensmart's effective price = official × per-model discount multiplier
What you see is what you pay — no hidden margin, no tiered gates, no minimum top-up

What's next

Coming up:

Feasibility assessment for video models (Sora / Runway etc.)
Multimodal chat (image + text input) end-to-end
More Chinese models (Zhipu GLM-5, latest Moonshot)

If you have specific use cases or vendors you want prioritised, let us know in the enterprise WeChat group or at support@tokensmart.ai.