DeepSeek Guide

DeepSeek models on Tokensmart use the OpenAI-compatible protocol — call them via /v1/chat/completions. You do not need a separate DeepSeek account — a single Tokensmart API key works.

Available models

Model	Thinking mode	Context	Best for
`deepseek-v4-pro`	Hybrid (on by default)	1M	Flagship: strongest reasoning, complex code, math, multi-step logic
`deepseek-v4-flash`	Hybrid (on by default)	1M	Economy: faster & cheaper, ideal for high-concurrency and simpler tasks

See live pricing at /pricing. Pure pay-per-token, no monthly fee.

Quickstart

curl

curl https://api.tokensmart.ai/v1/chat/completions \
  -H "Authorization: Bearer pk_live_xxx" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-v4-flash",
    "messages": [{"role": "user", "content": "Who are you?"}],
    "stream": false
  }'

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    api_key="pk_live_xxx",
    base_url="https://api.tokensmart.ai/v1",
)

resp = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "Who are you?"}],
)
print(resp.choices[0].message.content)

Node.js (OpenAI SDK)

import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "pk_live_xxx",
  baseURL: "https://api.tokensmart.ai/v1",
});

const resp = await client.chat.completions.create({
  model: "deepseek-v4-flash",
  messages: [{ role: "user", content: "Who are you?" }],
});
console.log(resp.choices[0].message.content);

Thinking-mode controls (DeepSeek-specific fields)

DeepSeek exposes several non-OpenAI-standard fields. Tokensmart's DeepSeek route supports them all:

Field	Type	Models	Description
`enable_thinking`	boolean	v4-pro / v4-flash	Toggle thinking on/off. Hybrid models default to `true`
`thinking_budget`	integer	v4-pro / v4-flash	Max tokens for thinking (default 32768)
`reasoning_effort`	"high" \| "max"	v4-pro / v4-flash	Reasoning depth. OpenAI-standard, works platform-wide

Disable thinking to save tokens

Hybrid models think by default. Even on trivial prompts they burn dozens to hundreds of output tokens on internal reasoning. Reasoning tokens are billed at the output rate — cost adds up. For simple tasks, turn it off explicitly:

Python (OpenAI SDK uses extra_body for non-standard fields):

resp = client.chat.completions.create(
    model="deepseek-v4-flash",
    messages=[{"role": "user", "content": "1+1=?"}],
    extra_body={"enable_thinking": False},
)

Node.js (top-level field works directly):

const resp = await client.chat.completions.create({
  model: "deepseek-v4-flash",
  messages: [{ role: "user", content: "1+1=?" }],
  enable_thinking: false,
});

curl (top-level JSON):

-d '{"model":"deepseek-v4-flash","messages":[...],"enable_thinking":false}'

Hybrid models think by default. For pure-reasoning work, use deepseek-v4-pro with reasoning_effort.

Crank up reasoning depth

reasoning_effort makes deepseek-v4-pro think harder on tough problems:

resp = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[{"role": "user", "content": "Prove Fermat's little theorem"}],
    reasoning_effort="max",  # high (default) | max
)

max thinks deeper than high — more output tokens, higher per-call cost.

Streaming + reasoning content

In streaming mode, hybrid / thinking-only models emit reasoning_content first (the chain of thought), then content (the final answer). The two fields stream independently:

stream = client.chat.completions.create(
    model="deepseek-v4-pro",
    messages=[{"role": "user", "content": "Why is the sky blue?"}],
    stream=True,
    stream_options={"include_usage": True},
)

for chunk in stream:
    if not chunk.choices:
        # Last chunk carries usage
        print("\nusage:", chunk.usage)
        continue

    delta = chunk.choices[0].delta
    if hasattr(delta, "reasoning_content") and delta.reasoning_content:
        print(delta.reasoning_content, end="", flush=True)  # thinking
    if delta.content:
        print(delta.content, end="", flush=True)  # answer

Billing

Input / output billed per token — see /pricing
Reasoning tokens charged at output rate — thinking-on requests cost significantly more
Context cache (v4 series) uses separate cache_read / cache_creation rates
Per-call cost shown in /api-logs at 6-decimal USD precision

Our billing mirrors DeepSeek official pricing 1:1 — zero markup beyond the published rate.

FAQ

Q: Can I use the official DeepSeek Python SDK? A: Yes. DeepSeek's official SDK is OpenAI-compatible — point base_url at https://api.tokensmart.ai/v1 and use a pk_live_xxx key.

Q: Why are tokens still high after enable_thinking: false? A: Enforcement varies by model. v4-pro strictly disables thinking; v4-flash sometimes still thinks on certain prompts. Inspect usage.completion_tokens_details.reasoning_tokens in the response to see the actual reasoning-token count.

Q: v4-pro is slow / timing out at high reasoning effort — what gives? A: With reasoning_effort: max, tough problems can take 1-2 minutes. If your prompt triggers longer thinking, split the task or drop back to the default high effort.

Q: Are there rate limits on the DeepSeek route? A: Same as the rest of the platform: 120 RPM per account. Email support@tokensmart.ai for higher quotas.

Q: Can I call DeepSeek via the Anthropic-style /v1/messages endpoint? A: No. /v1/messages only supports Anthropic-format models (Claude series). DeepSeek uses the OpenAI protocol, so all DeepSeek traffic must go through /v1/chat/completions.