Understanding Claude Cache Token Pricing, Clearly

Why caching matters

Imagine you are building a customer service bot. Every conversation carries an 8000-token system prompt — company info, FAQs, product catalog.

Without caching:

That is clearly too expensive. Anthropic's prompt caching is built for exactly this scenario.

With caching enabled, Anthropic splits your request into three kinds of tokens:

Type	Meaning	Rate (vs normal input)
Regular input	Uncached portion, usually the latest user question	1x (normal price)
cache_creation	First-time write into the cache, usually the system prompt	1.25x (slightly more expensive)
cache_read	Subsequent hits pulled from the cache	0.1x (one-tenth of normal)

Tokensmart charges you exactly as Anthropic prices:

total_cost =
  regular_input × input_price +
  cache_read × cache_read_price +
  cache_creation × cache_creation_price +
  output × output_price

System prompt = 8000 tokens, user question = 100 tokens, response = 300 tokens:

First conversation (cold cache):

Conversations 2 through 10,000 (cache hit):

Daily total for the system prompt drops from ¥1200 to around ¥120 — a flat 10x savings.

5-minute TTL: Anthropic's cache lives for 5 minutes of idle time. Only high-frequency traffic benefits
1024-token minimum: System prompts shorter than 1024 tokens will not be cached
Must opt in explicitly: Add cache_control: { type: "ephemeral" } to your request, otherwise caching is off

Open API logs. Every row has dedicated cache_read and cache_creation columns, and the hover tooltip shows the token × rate × subtotal breakdown.