Kimi API Pricing Calculator

Calculate Moonshot AI (Kimi) 2026 costs dynamically. Supports K2.5 Multimodal, Turbo architectures, and automated context caching logic.

1. Select Kimi Model

Kimi's most intelligent flagship. Native multimodal (Text/Image/Video), thinking & non-thinking modes. Context: 256K

2. Usage Projection (Per 1M Tokens)

Input Context (New Tokens)$0.6 / 1M

Cached Context (Automatic Hit)$0.1 / 1M

Kimi K2 series uses automated prefix caching. Hit rates apply to reused context.

Generated Response (Output)$3 / 1M

API Estimate

New Prompt Cost$0.000600

Cached Context Savings$0.000000

Output Tokens$0.001500

Estimated Total

^$0.0021

Official Moonshot AI rates for 2026. Input pricing varies by cache state. V1 models do not support cache hit discounts.

Kimi AI Concurrency Tiers

Moonshot AI limits concurrency and throughput based on your cumulative recharge level. Reach Tier 3+ for high-scale production apps.

Level	Total Recharge	Concurrency	RPM	Tokens/Min (TPM)
Tier 0	$1	1	3	500K
Tier 1	$10	50	200	2M
Tier 2	$20	100	500	3M
Tier 3	$100	200	5k	3M
Tier 4	$1000	400	5k	4M
Tier 5	$3000	1k	10k	5M

Deep Reasoning & Turbo Tech

Kimi K2.5 Multimodal

K2.5 is Kimi's most inteligente model yet. It uses a **native multimodal architecture**, meaning it understands text, audio, and visual data in a single stream rather than transcribing it first. Supports **256K context** windows across all modalities.

1 Trillion Parameter MoE

The K2 family uses a Mixture-of-Experts (MoE) configuration with **1 trillion total parameters**, activating only 32 billion at a time. This allows for flagship-level reasoning with the efficiency of a smaller model.

Thinking & Reasoning

Models like **kimi-k2-thinking** are specifically tuned for slow, iterative chain-of-thought processing. They excel at code reviews, mathematical proofs, and complex agentic planning where deep reasoning beats fast generation.

Turbo Speed Architectures

Turbo versions of the K2 architecture are optimized for throughput. They achieve **60-100 tokens per second**, making them ideal for end-user chat interfaces where low latency is critical without sacrificing context length.

Kimi API FAQ

How does the automatic context caching work?

Kimi K2 models automatically identify reused prompt prefixes. If a prefix is found in the cache, those tokens are billed at the **Cache Hit** rate (up to 80% cheaper). This happens seamlessly without requiring manual cache headers.

Does Moonshot V1 support caching?

No. The 2026 pricing schedule only applies context caching discounts to the **Kimi K2** and **Kimi K2.5** lines. Older V1 models charge a flat input rate Regardless of context reuse.

What happened to the Free Tier?

As of 2026, Moonshot requires a **$1 minimum recharge** to start using the API. However, completing a $5 recharge grants a $5 voucher, effectively giving new developers an entry point for experimentation.