Picklyone - Your Intelligent AI Assistant

为什么需要缓存

假设你在做一个客服机器人,每次对话都要带一段 8000 token 的系统提示词 —— 包含公司信息、FAQ、产品目录。

如果按常规计价:

这显然太贵了。Anthropic 的 prompt caching 就是为这种场景设计的。

启用缓存后,Anthropic 会把你的请求拆成三种 token:

类型	含义	费率(相对于普通 input)
Regular input	没缓存的部分,通常是用户最新的问题	1x(普通价格)
cache_creation	首次写入缓存的 token,通常是 system prompt	1.25x(比普通稍贵)
cache_read	后续命中缓存读出来的 token	0.1x(只要普通价格的 1/10)

Picklyone 的账单完全按 Anthropic 的定价计算:

总成本 =
  regular_input × input_price +
  cache_read × cache_read_price +
  cache_creation × cache_creation_price +
  output × output_price

假设 system prompt 是 8000 token,每次对话的用户问题是 100 token,回复是 300 token:

第一次对话(缓存冷启动):

第二到第 10000 次(缓存命中):

一天下来,system prompt 的总成本从 ¥1200 降到 ¥120 出头,直接省了 10 倍。

打开 API 日志,每一行都有独立的 cache_read 和 cache_creation 列,鼠标悬停还能看到 token × 单价 × 小计的拆解。