Back to Blog
·Tokensmart Team·4 min read

Crossing 100 Billion Tokens: A Milestone Recap and the Road Ahead

milestoneroadmap

Tokensmart usage crosses 100 billion tokens

In one sentence

Tokensmart has now processed more than 100 billion tokens in total. Behind that number sit tens of thousands of code-generation sessions, millions of conversation turns, and a lot of late-night debugging — yours and ours.

This number belongs to every one of you. Thanks for being on the ride.

What 100 billion tokens looks like

  • Spread across OpenAI / Anthropic / Google / DeepSeek / Qwen / Kimi / GLM and a dozen other vendors
  • Spanning text chat, code generation, long-context analysis, image generation, tool use, and more
  • Serving everyone from solo developers and indie products to full teams and enterprises
  • All on transparent billing, full request logs, and pay-as-you-actually-use — zero hidden margin

A few milestones we shipped along the way

DateEvent
2026-04Platform launch + transparent pricing model (official price × discount rate) goes live
2026-04Image generation launches (/v1/images/generations, OpenAI-compatible)
2026-04GPT-5.5, Claude Opus 4.7, DeepSeek V4 all onboarded together
2026-05OpenAI & Claude dual-protocol fully unified (any SDK × any model)
2026-05100 billion tokens served, cumulative

What is next on the platform

A milestone is just a marker. The list of things we still want to build is longer than the list of things we have shipped. Here it is, by category.

1. Models: broader, fresher, more reliable

  • Fast onboarding of new Chinese models — GLM-5, Kimi K2, the next Qwen release: live the moment upstream ships
  • Video generation feasibility — assessing Sora, Runway, Kling and similar; we will share results as soon as we have them
  • Deeper multimodal coverage — image + text + tool calls in a single conversation, unified billing and logging
  • Transparent deprecation — any model removal gets a 30-day notice; no silent swaps

2. Protocol layer: more accurate, more complete

  • Protocol conversion fidelity — continuing to polish edge cases (deeply nested tool_use, unusual stop_reason, multi-turn tool chains)
  • Responses API everywhere — enabled on the sub2api family today; expanding to more upstream vendors
  • Streaming, caching, tool use — identical on both sides — full feature parity between OpenAI and Anthropic protocols

3. Console: finer-grained, more useful

  • Finer billing views — aggregate by API key, by model, by project; export CSV for reconciliation
  • Teams and projects — sub-accounts under a primary account, per-account limits and logs (in progress)
  • Usage alerts + auto-pause — set a threshold, automatically pause a key when it is hit, no accidental overspend
  • Friendlier key management — names, tags, model allowlists, IP allowlists, all visual

4. Infrastructure: more stable, lower latency

  • Multi-gateway elastic routing — continuing to harden failover; single-upstream outage → second-scale rerouting, no dropped requests
  • TTFT / P99 latency optimization — and we will keep publishing the real latency and success-rate data for core models
  • Observability surfaced to you — the success-rate and TTFT sparklines on the Models page (currently placeholders) get hooked up to real telemetry
  • Wider geographic reach — Cloudflare edge already covers the globe; next is polishing access from mainland China

5. Pricing: still transparent, still optimizing cost

  • Transparent pricing staysplatform_price = official_price × rate, each model's rate published on the pricing page
  • Upstream cost wins go to you — when we negotiate better upstream rates, the discount lands on your invoice, not in our pocket
  • Cache billing 100% transparentcache_read_tokens always shown separately, never double-charged
  • Pre-paid + post-paid mix — solo developers stay on pay-as-you-go; enterprises will get monthly invoicing (in progress)

6. Developer experience: docs, examples, SDK-friendly

  • More code samples — full runnable examples in Python, Node, Go, Java
  • Tool integrations stay fresh — Cherry Studio, Cursor, Claude Code, Codex CLI guides kept up to date
  • Better debugging — direct console access to log detail, error replay, upstream error-code translation

Beyond the number

100 billion tokens is not a number we generated — it is a number you generated, across thousands of API keys, in your own IDEs, terminals and production systems.

Every error report, every piece of feedback, every "can this be a little better" landed. The next 100 billion will get here faster than this one did.

Write to us

The roadmap stays open. What do you want to see first? Which models, which features, which integrations? Tell us in the enterprise WeChat group, on the contact page, or via support@tokensmart.ai.

Thanks again to everyone who has been on this ride 🙌