DeepSeek calculator

Separate cache hits before you trust the DeepSeek bill.

Estimate DeepSeek V4 Flash and V4 Pro costs from cache-hit input, cache-miss input, output tokens, thinking mode, and daily request volume. The split matters because DeepSeek's cached input can be dramatically cheaper than fresh input.

Inputs

Numeric workload is saved locally in this browser for easier comparison. Prompt text is not saved.

ModelOutput planningCalls per day

Cache-hit input tokensCache-miss input tokensOutput tokens

Days per month

DeepSeek reports cache hits as prompt_cache_hit_tokens and cache misses as prompt_cache_miss_tokens. Use API usage mode when you have those usage fields.

Result

$19.82estimated monthly cost

Per request<$0.01

Daily$0.66

Total input18,000

Monthly calls30,000

Cache hit input<$0.01

Cache miss input<$0.01

Output<$0.01

Annual$238

High confidence: based on DeepSeek usage fields such as prompt_cache_hit_tokens and prompt_cache_miss_tokens.

Pricing checked 2026-05-09. DeepSeek V4 Flash: cache hit $0.0028/1M, cache miss $0.14/1M, output $0.28/1M tokens.

Thinking output should be included in output token planning because DeepSeek bills by generated tokens.

Official model. The compatibility aliases deepseek-chat and deepseek-reasoner map to V4 Flash non-thinking and thinking modes.

Cost insight: the largest cost driver is output tokens. Cutting output tokens by 50% would save about $5.04 per month.

Estimated planning result only. Prices can change, and provider bills may include taxes, minimums, feature-specific charges, or usage adjustments. Verify production spend in the official provider dashboard.

Try this workload inLLM token calculator OpenAI Claude Gemini RAG AI Agent Embedding

How to use

How to use this calculator

Split cache hits and misses
Use DeepSeek usage fields when available, because cache-hit and cache-miss prompt tokens can be priced very differently.
Include expected output
Estimate answer tokens, including reasoning output when your workload uses thinking-style responses.
Use the monthly projection
Review the cost driver, copy the summary, and compare with other LLM providers before scaling volume.

DeepSeek cost guide

Treat cache-hit and cache-miss prompt tokens as different budgets.

DeepSeek cost planning is strongest when you separate cache-hit input, cache-miss input, and generated output instead of flattening everything into one token number.

How the estimate works

Cache-hit input

Prompt tokens served from DeepSeek context cache can use a lower cache-hit price, so they should stay separate in the estimate.

Cache-miss input

Fresh prompt tokens use cache-miss pricing and often explain why a first request costs more than repeated requests.

Output and thinking

Generated output includes normal answers and thinking-style tokens when your workload uses reasoning responses.

Example workloads

Cached chatbot

Model stable instructions as cache-hit tokens after warmup, then add each user message as cache-miss input.

Reasoning answer

Increase output tokens when the task needs chain-of-thought-style thinking or longer analytical responses.

High-volume API

Compare V4 Flash and V4 Pro at the same cache hit rate before assuming one tier is always cheaper.

Cost optimization tips

Read prompt_cache_hit_tokens and prompt_cache_miss_tokens from usage data when available.
Design stable prefixes so repeated context has a better chance of hitting cache.
Budget thinking output separately for tasks that generate longer reasoning traces.
Recheck pricing dates because promotional or model-tier pricing can change.

Common mistakes

Combining cache-hit and cache-miss tokens into one input total.
Leaving thinking output out of the output-token estimate.
Assuming the cache hit rate is stable before measuring real traffic.
Comparing DeepSeek against other providers without using the same workload numbers.

References

Built around DeepSeek's cache-aware billing model.

DeepSeek Models & Pricing DeepSeek Token & Token Usage DeepSeek Context Caching DeepSeek Thinking Mode

DeepSeek context caching is enabled by default and reports cache status through usage fields. This static page does not call DeepSeek APIs or ask for API keys; it is a transparent planning calculator for pre-production budgeting.

FAQ

DeepSeek API cost quick answers

What makes DeepSeek API cost estimation different?

DeepSeek reports cache-hit and cache-miss prompt tokens separately, and those token categories can have very different prices.

Should thinking tokens be included in DeepSeek output cost?

Yes. Thinking output should be included in output token planning because generated reasoning tokens can contribute to billed output.

Does this calculator send my prompt to DeepSeek?

No. It estimates costs in the browser and does not ask for an API key or send prompt text to DeepSeek.