DeepSeek Guide
DeepSeek models on Tokensmart use the OpenAI-compatible protocol — call them via /v1/chat/completions. You do not need a separate DeepSeek account — a single Tokensmart API key works.
Available models
| Model | Thinking mode | Context | Best for |
|---|---|---|---|
deepseek-v4-pro | Hybrid (on by default) | 1M | Flagship: strongest reasoning, complex code, math, multi-step logic |
deepseek-v4-flash | Hybrid (on by default) | 1M | Economy: faster & cheaper, ideal for high-concurrency and simpler tasks |
See live pricing at /pricing. Pure pay-per-token, no monthly fee.
Quickstart
curl
curl https://api.tokensmart.ai/v1/chat/completions \
-H "Authorization: Bearer pk_live_xxx" \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-v4-flash",
"messages": [{"role": "user", "content": "Who are you?"}],
"stream": false
}'
Python (OpenAI SDK)
from openai import OpenAI
client = OpenAI(
api_key="pk_live_xxx",
base_url="https://api.tokensmart.ai/v1",
)
resp = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[{"role": "user", "content": "Who are you?"}],
)
print(resp.choices[0].message.content)
Node.js (OpenAI SDK)
import OpenAI from "openai";
const client = new OpenAI({
apiKey: "pk_live_xxx",
baseURL: "https://api.tokensmart.ai/v1",
});
const resp = await client.chat.completions.create({
model: "deepseek-v4-flash",
messages: [{ role: "user", content: "Who are you?" }],
});
console.log(resp.choices[0].message.content);
Thinking-mode controls (DeepSeek-specific fields)
DeepSeek exposes several non-OpenAI-standard fields. Tokensmart's DeepSeek route supports them all:
| Field | Type | Models | Description |
|---|---|---|---|
enable_thinking | boolean | v4-pro / v4-flash | Toggle thinking on/off. Hybrid models default to true |
thinking_budget | integer | v4-pro / v4-flash | Max tokens for thinking (default 32768) |
reasoning_effort | "high" | "max" | v4-pro / v4-flash | Reasoning depth. OpenAI-standard, works platform-wide |
Disable thinking to save tokens
Hybrid models think by default. Even on trivial prompts they burn dozens to hundreds of output tokens on internal reasoning. Reasoning tokens are billed at the output rate — cost adds up. For simple tasks, turn it off explicitly:
Python (OpenAI SDK uses extra_body for non-standard fields):
resp = client.chat.completions.create(
model="deepseek-v4-flash",
messages=[{"role": "user", "content": "1+1=?"}],
extra_body={"enable_thinking": False},
)
Node.js (top-level field works directly):
const resp = await client.chat.completions.create({
model: "deepseek-v4-flash",
messages: [{ role: "user", content: "1+1=?" }],
enable_thinking: false,
});
curl (top-level JSON):
-d '{"model":"deepseek-v4-flash","messages":[...],"enable_thinking":false}'
Hybrid models think by default. For pure-reasoning work, use
deepseek-v4-prowithreasoning_effort.
Crank up reasoning depth
reasoning_effort makes deepseek-v4-pro think harder on tough problems:
resp = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[{"role": "user", "content": "Prove Fermat's little theorem"}],
reasoning_effort="max", # high (default) | max
)
max thinks deeper than high — more output tokens, higher per-call cost.
Streaming + reasoning content
In streaming mode, hybrid / thinking-only models emit reasoning_content first (the chain of thought), then content (the final answer). The two fields stream independently:
stream = client.chat.completions.create(
model="deepseek-v4-pro",
messages=[{"role": "user", "content": "Why is the sky blue?"}],
stream=True,
stream_options={"include_usage": True},
)
for chunk in stream:
if not chunk.choices:
# Last chunk carries usage
print("\nusage:", chunk.usage)
continue
delta = chunk.choices[0].delta
if hasattr(delta, "reasoning_content") and delta.reasoning_content:
print(delta.reasoning_content, end="", flush=True) # thinking
if delta.content:
print(delta.content, end="", flush=True) # answer
Billing
- Input / output billed per token — see /pricing
- Reasoning tokens charged at output rate — thinking-on requests cost significantly more
- Context cache (v4 series) uses separate cache_read / cache_creation rates
- Per-call cost shown in /api-logs at 6-decimal USD precision
Our billing mirrors DeepSeek official pricing 1:1 — zero markup beyond the published rate.
FAQ
Q: Can I use the official DeepSeek Python SDK?
A: Yes. DeepSeek's official SDK is OpenAI-compatible — point base_url at https://api.tokensmart.ai/v1 and use a pk_live_xxx key.
Q: Why are tokens still high after enable_thinking: false?
A: Enforcement varies by model. v4-pro strictly disables thinking; v4-flash sometimes still thinks on certain prompts. Inspect usage.completion_tokens_details.reasoning_tokens in the response to see the actual reasoning-token count.
Q: v4-pro is slow / timing out at high reasoning effort — what gives?
A: With reasoning_effort: max, tough problems can take 1-2 minutes. If your prompt triggers longer thinking, split the task or drop back to the default high effort.
Q: Are there rate limits on the DeepSeek route?
A: Same as the rest of the platform: 120 RPM per account. Email support@tokensmart.ai for higher quotas.
Q: Can I call DeepSeek via the Anthropic-style /v1/messages endpoint?
A: No. /v1/messages only supports Anthropic-format models (Claude series). DeepSeek uses the OpenAI protocol, so all DeepSeek traffic must go through /v1/chat/completions.