
In one sentence
Tokensmart has now processed more than 100 billion tokens in total. Behind that number sit tens of thousands of code-generation sessions, millions of conversation turns, and a lot of late-night debugging — yours and ours.
This number belongs to every one of you. Thanks for being on the ride.
What 100 billion tokens looks like
- Spread across OpenAI / Anthropic / Google / DeepSeek / Qwen / Kimi / GLM and a dozen other vendors
- Spanning text chat, code generation, long-context analysis, image generation, tool use, and more
- Serving everyone from solo developers and indie products to full teams and enterprises
- All on transparent billing, full request logs, and pay-as-you-actually-use — zero hidden margin
A few milestones we shipped along the way
| Date | Event |
|---|---|
| 2026-04 | Platform launch + transparent pricing model (official price × discount rate) goes live |
| 2026-04 | Image generation launches (/v1/images/generations, OpenAI-compatible) |
| 2026-04 | GPT-5.5, Claude Opus 4.7, DeepSeek V4 all onboarded together |
| 2026-05 | OpenAI & Claude dual-protocol fully unified (any SDK × any model) |
| 2026-05 | 100 billion tokens served, cumulative ✨ |
What is next on the platform
A milestone is just a marker. The list of things we still want to build is longer than the list of things we have shipped. Here it is, by category.
1. Models: broader, fresher, more reliable
- Fast onboarding of new Chinese models — GLM-5, Kimi K2, the next Qwen release: live the moment upstream ships
- Video generation feasibility — assessing Sora, Runway, Kling and similar; we will share results as soon as we have them
- Deeper multimodal coverage — image + text + tool calls in a single conversation, unified billing and logging
- Transparent deprecation — any model removal gets a 30-day notice; no silent swaps
2. Protocol layer: more accurate, more complete
- Protocol conversion fidelity — continuing to polish edge cases (deeply nested
tool_use, unusualstop_reason, multi-turn tool chains) - Responses API everywhere — enabled on the sub2api family today; expanding to more upstream vendors
- Streaming, caching, tool use — identical on both sides — full feature parity between OpenAI and Anthropic protocols
3. Console: finer-grained, more useful
- Finer billing views — aggregate by API key, by model, by project; export CSV for reconciliation
- Teams and projects — sub-accounts under a primary account, per-account limits and logs (in progress)
- Usage alerts + auto-pause — set a threshold, automatically pause a key when it is hit, no accidental overspend
- Friendlier key management — names, tags, model allowlists, IP allowlists, all visual
4. Infrastructure: more stable, lower latency
- Multi-gateway elastic routing — continuing to harden failover; single-upstream outage → second-scale rerouting, no dropped requests
- TTFT / P99 latency optimization — and we will keep publishing the real latency and success-rate data for core models
- Observability surfaced to you — the success-rate and TTFT sparklines on the Models page (currently placeholders) get hooked up to real telemetry
- Wider geographic reach — Cloudflare edge already covers the globe; next is polishing access from mainland China
5. Pricing: still transparent, still optimizing cost
- Transparent pricing stays —
platform_price = official_price × rate, each model's rate published on the pricing page - Upstream cost wins go to you — when we negotiate better upstream rates, the discount lands on your invoice, not in our pocket
- Cache billing 100% transparent —
cache_read_tokensalways shown separately, never double-charged - Pre-paid + post-paid mix — solo developers stay on pay-as-you-go; enterprises will get monthly invoicing (in progress)
6. Developer experience: docs, examples, SDK-friendly
- More code samples — full runnable examples in Python, Node, Go, Java
- Tool integrations stay fresh — Cherry Studio, Cursor, Claude Code, Codex CLI guides kept up to date
- Better debugging — direct console access to log detail, error replay, upstream error-code translation
Beyond the number
100 billion tokens is not a number we generated — it is a number you generated, across thousands of API keys, in your own IDEs, terminals and production systems.
Every error report, every piece of feedback, every "can this be a little better" landed. The next 100 billion will get here faster than this one did.
Write to us
The roadmap stays open. What do you want to see first? Which models, which features, which integrations? Tell us in the enterprise WeChat group, on the contact page, or via support@tokensmart.ai.
Thanks again to everyone who has been on this ride 🙌