Crossing 100 Billion Tokens: A Milestone Recap and the Road Ahead

Tokensmart usage crosses 100 billion tokens

In one sentence

Tokensmart has now processed more than 100 billion tokens in total. Behind that number sit tens of thousands of code-generation sessions, millions of conversation turns, and a lot of late-night debugging — yours and ours.

This number belongs to every one of you. Thanks for being on the ride.

What 100 billion tokens looks like

Spread across OpenAI / Anthropic / Google / DeepSeek / Qwen / Kimi / GLM and a dozen other vendors
Spanning text chat, code generation, long-context analysis, image generation, tool use, and more
Serving everyone from solo developers and indie products to full teams and enterprises
All on transparent billing, full request logs, and pay-as-you-actually-use — zero hidden margin

A few milestones we shipped along the way

Date	Event
2026-04	Platform launch + transparent pricing model (`official price × discount rate`) goes live
2026-04	Image generation launches (`/v1/images/generations`, OpenAI-compatible)
2026-04	GPT-5.5, Claude Opus 4.7, DeepSeek V4 all onboarded together
2026-05	OpenAI & Claude dual-protocol fully unified (any SDK × any model)
2026-05	100 billion tokens served, cumulative ✨

What is next on the platform

A milestone is just a marker. The list of things we still want to build is longer than the list of things we have shipped. Here it is, by category.

1. Models: broader, fresher, more reliable

Fast onboarding of new Chinese models — GLM-5, Kimi K2, the next Qwen release: live the moment upstream ships
Video generation feasibility — assessing Sora, Runway, Kling and similar; we will share results as soon as we have them
Deeper multimodal coverage — image + text + tool calls in a single conversation, unified billing and logging
Transparent deprecation — any model removal gets a 30-day notice; no silent swaps

2. Protocol layer: more accurate, more complete

Protocol conversion fidelity — continuing to polish edge cases (deeply nested tool_use, unusual stop_reason, multi-turn tool chains)
Responses API everywhere — enabled on the sub2api family today; expanding to more upstream vendors
Streaming, caching, tool use — identical on both sides — full feature parity between OpenAI and Anthropic protocols

3. Console: finer-grained, more useful

Finer billing views — aggregate by API key, by model, by project; export CSV for reconciliation
Teams and projects — sub-accounts under a primary account, per-account limits and logs (in progress)
Usage alerts + auto-pause — set a threshold, automatically pause a key when it is hit, no accidental overspend
Friendlier key management — names, tags, model allowlists, IP allowlists, all visual

4. Infrastructure: more stable, lower latency

Multi-gateway elastic routing — continuing to harden failover; single-upstream outage → second-scale rerouting, no dropped requests
TTFT / P99 latency optimization — and we will keep publishing the real latency and success-rate data for core models
Observability surfaced to you — the success-rate and TTFT sparklines on the Models page (currently placeholders) get hooked up to real telemetry
Wider geographic reach — Cloudflare edge already covers the globe; next is polishing access from mainland China

5. Pricing: still transparent, still optimizing cost

Transparent pricing stays — platform_price = official_price × rate, each model's rate published on the pricing page
Upstream cost wins go to you — when we negotiate better upstream rates, the discount lands on your invoice, not in our pocket
Cache billing 100% transparent — cache_read_tokens always shown separately, never double-charged
Pre-paid + post-paid mix — solo developers stay on pay-as-you-go; enterprises will get monthly invoicing (in progress)

6. Developer experience: docs, examples, SDK-friendly

More code samples — full runnable examples in Python, Node, Go, Java
Tool integrations stay fresh — Cherry Studio, Cursor, Claude Code, Codex CLI guides kept up to date
Better debugging — direct console access to log detail, error replay, upstream error-code translation

Beyond the number

100 billion tokens is not a number we generated — it is a number you generated, across thousands of API keys, in your own IDEs, terminals and production systems.

Every error report, every piece of feedback, every "can this be a little better" landed. The next 100 billion will get here faster than this one did.

Write to us

The roadmap stays open. What do you want to see first? Which models, which features, which integrations? Tell us in the enterprise WeChat group, on the contact page, or via support@tokensmart.ai.

Thanks again to everyone who has been on this ride 🙌