Back to Blog
·Tokensmart Team·2 min read

Understanding Claude Cache Token Pricing, Clearly

pricingtechnicalcache

Why caching matters

Imagine you are building a customer service bot. Every conversation carries an 8000-token system prompt — company info, FAQs, product catalog.

Without caching:

  • Each conversation input: 8000 tokens × ¥0.015 / 1K = ¥0.12
  • 10,000 conversations per day: ¥1200 spent purely on the system prompt

That is clearly too expensive. Anthropic's prompt caching is built for exactly this scenario.

The three token types

With caching enabled, Anthropic splits your request into three kinds of tokens:

TypeMeaningRate (vs normal input)
Regular inputUncached portion, usually the latest user question1x (normal price)
cache_creationFirst-time write into the cache, usually the system prompt1.25x (slightly more expensive)
cache_readSubsequent hits pulled from the cache0.1x (one-tenth of normal)

The billing formula

Tokensmart charges you exactly as Anthropic prices:

total_cost =
  regular_input × input_price +
  cache_read × cache_read_price +
  cache_creation × cache_creation_price +
  output × output_price

Back to the example

System prompt = 8000 tokens, user question = 100 tokens, response = 300 tokens:

First conversation (cold cache):

  • cache_creation: 8000 × ¥0.01875 / 1K = ¥0.15
  • regular_input: 100 × ¥0.015 / 1K = ¥0.0015
  • output: 300 × ¥0.075 / 1K = ¥0.0225

Conversations 2 through 10,000 (cache hit):

  • cache_read: 8000 × ¥0.0015 / 1K = ¥0.012 ← one-tenth of normal
  • regular_input: 100 × ¥0.015 / 1K = ¥0.0015
  • output: 300 × ¥0.075 / 1K = ¥0.0225

Daily total for the system prompt drops from ¥1200 to around ¥120 — a flat 10x savings.

Gotchas

  1. 5-minute TTL: Anthropic's cache lives for 5 minutes of idle time. Only high-frequency traffic benefits
  2. 1024-token minimum: System prompts shorter than 1024 tokens will not be cached
  3. Must opt in explicitly: Add cache_control: { type: "ephemeral" } to your request, otherwise caching is off

How to see it in Tokensmart logs

Open API logs. Every row has dedicated cache_read and cache_creation columns, and the hover tooltip shows the token × rate × subtotal breakdown.