
Calculate Moonshot AI (Kimi) 2026 costs dynamically. Supports K2.5 Multimodal, Turbo architectures, and automated context caching logic.
Kimi's most intelligent flagship. Native multimodal (Text/Image/Video), thinking & non-thinking modes. Context: 256K
Kimi K2 series uses automated prefix caching. Hit rates apply to reused context.
Official Moonshot AI rates for 2026. Input pricing varies by cache state. V1 models do not support cache hit discounts.
Moonshot AI limits concurrency and throughput based on your cumulative recharge level. Reach Tier 3+ for high-scale production apps.
| Level | Total Recharge | Concurrency | RPM | Tokens/Min (TPM) |
|---|---|---|---|---|
| Tier 0 | $1 | 1 | 3 | 500K |
| Tier 1 | $10 | 50 | 200 | 2M |
| Tier 2 | $20 | 100 | 500 | 3M |
| Tier 3 | $100 | 200 | 5k | 3M |
| Tier 4 | $1000 | 400 | 5k | 4M |
| Tier 5 | $3000 | 1k | 10k | 5M |
K2.5 is Kimi's most inteligente model yet. It uses a **native multimodal architecture**, meaning it understands text, audio, and visual data in a single stream rather than transcribing it first. Supports **256K context** windows across all modalities.
The K2 family uses a Mixture-of-Experts (MoE) configuration with **1 trillion total parameters**, activating only 32 billion at a time. This allows for flagship-level reasoning with the efficiency of a smaller model.
Models like **kimi-k2-thinking** are specifically tuned for slow, iterative chain-of-thought processing. They excel at code reviews, mathematical proofs, and complex agentic planning where deep reasoning beats fast generation.
Turbo versions of the K2 architecture are optimized for throughput. They achieve **60-100 tokens per second**, making them ideal for end-user chat interfaces where low latency is critical without sacrificing context length.
Kimi K2 models automatically identify reused prompt prefixes. If a prefix is found in the cache, those tokens are billed at the **Cache Hit** rate (up to 80% cheaper). This happens seamlessly without requiring manual cache headers.
No. The 2026 pricing schedule only applies context caching discounts to the **Kimi K2** and **Kimi K2.5** lines. Older V1 models charge a flat input rate Regardless of context reuse.
As of 2026, Moonshot requires a **$1 minimum recharge** to start using the API. However, completing a $5 recharge grants a $5 voucher, effectively giving new developers an entry point for experimentation.