LongCat Chat: 560B Parameter Open Weight Model

Table Of Content
- LongCat Chat: Revolutionary 560B Parameter Open Weight Model
- What is LongCat Chat?
- Core Architecture
- Key Innovations
- Table Overview
- Key Features
- Scalable Architectural Design for Computational Efficiency
- Effective Model Scaling Strategy
- Multi-Stage Training Pipeline for Agentic Capability
- Evaluation Results
- General Domains
- Instruction Following
- Mathematical Reasoning
- Coding Capabilities
- Agentic Tool Use
- Safety Performance
- How to Use
- Chat Template
- First-Turn Conversation
- Multi-Turn Conversation
- Tool Calling Support
- Messages
- Tools
- Tool namespace: function
- Tool name: {func.name}
- Quick Start Guide
- FAQs
- Final Thoughts
LongCat Chat: Revolutionary 560B Parameter Open Weight Model
We introduce LongCat-Flash-Chat, a powerful and efficient language model with 560 billion total parameters, featuring an innovative Mixture-of-Experts (MoE) architecture. The model incorporates a dynamic computation mechanism that activates 18.6B∼31.3B parameters (averaging ∼27B) based on contextual demands, optimizing both computational efficiency and performance.
To achieve advanced training and inference efficiency, we employ a shortcut-connected architecture that expands computation-communication overlap window, achieving over 100 tokens per second (TPS) for inference cost-effectively. Our comprehensive training and scaling strategies ensure stable, efficient training, while tailored data strategies enhance model performance.
Now we release LongCat-Flash-Chat, a non-thinking foundation model that delivers highly competitive performance among leading models, with exceptional strengths in agentic tasks.
What is LongCat Chat?
LongCat-Flash-Chat is a state-of-the-art language model developed with a focus on efficiency and performance. It represents a breakthrough in large-scale language modeling through its innovative architecture and training methodologies.
Core Architecture
- Total Parameters: 560 billion parameters
- Activated Parameters: 18.6B to 31.3B (average ~27B) based on context
- Architecture Type: Mixture-of-Experts (MoE)
- Context Length: 128k tokens
- Inference Speed: Over 100 tokens per second
Key Innovations
- Dynamic Computation: Zero-computation experts mechanism for efficient parameter activation
- Shortcut-connected MoE (ScMoE): Expands computation-communication overlap
- Multi-stage Training: Advanced agentic capabilities through specialized training pipeline
Table Overview
Here's a comprehensive overview of LongCat-Flash-Chat specifications:
| Feature | Details |
|---|---|
| Model Name | LongCat-Flash-Chat |
| Total Parameters | 560 Billion |
| Activated Parameters | 18.6B - 31.3B (avg. ~27B) |
| Architecture | Mixture-of-Experts (MoE) |
| Context Length | 128k tokens |
| Inference Speed | 100+ TPS |
| Training Scale | Tens of thousands of accelerators |
| License | Available on Hugging Face |
| Specialization | Agentic tasks, reasoning, coding |
| Safety Features | Comprehensive safety evaluation |
Key Features
Scalable Architectural Design for Computational Efficiency
LongCat-Flash is designed and optimized under two key principles: efficient computation utilization and efficient training and inference.
Dynamic Computation Budget:
- Introduces zero-computation experts mechanism in MoE blocks
- Allocates dynamic computation budget to important tokens based on significance
- Activates 18.6 to 31.3 billion parameters based on contextual demands
- Expert bias adjusted by PID-controller maintains average of ~27 billion activated parameters per token
Communication Optimization:
- Shortcut-connected MoE (ScMoE) design expands computation-communication overlap window
- Customized infrastructure optimizations enable massive scale training
- Supports training on tens of thousands of accelerators
- High throughput and low latency inference capabilities
Effective Model Scaling Strategy
Comprehensive Stability-and-Scaling Framework:
- Hyperparameter Transfer Strategy: Successfully applies to large models by leveraging smaller proxy models with theoretical guarantees
- Model-Growth Mechanism: Initializes using refined half-scale checkpoint for improved performance
- Multi-Pronged Stability Suite:
- Principled router-gradient balancing
- Hidden z-loss to suppress massive activations
- Fine-tuned optimizer configurations
- Deterministic Computation: Guarantees exact reproducibility and enables SDC (Silent Data Corruption) detection
Multi-Stage Training Pipeline for Agentic Capability
Advanced Training Methodology:
- Base Model Construction: Two-stage pretraining data fusion strategy for reasoning-intensive domain data
- Mid-Training Enhancement: Reasoning and coding capabilities with 128k context length
- Multi-Stage Post-Training: Multi-agent synthesis framework defining task difficulty across three axes:
- Information processing
- Tool-set complexity
- User interaction
Evaluation Results
LongCat-Flash-Chat demonstrates exceptional performance across various benchmarks:
General Domains
- MMLU: 89.71% accuracy
- MMLU-Pro: 82.68% accuracy
- ArenaHard-V2: 86.50% accuracy
- CEval: 90.44% accuracy
- CMMLU: 84.34% accuracy
Instruction Following
- IFEval: 89.65% accuracy
- COLLIE: 57.10% accuracy
- Meeseeks-zh: 43.03% accuracy
Mathematical Reasoning
- MATH500: 96.40% accuracy
- AIME24: 70.42 average score
- AIME25: 61.25 average score
- BeyondAIME: 43.00 average score
Coding Capabilities
- LiveCodeBench: 48.02% pass@1
- Humaneval+: 88.41% pass@1
- MBPP+: 79.63% pass@1
- SWE-Bench-Verified: 60.40% accuracy
- TerminalBench: 39.51% accuracy
Agentic Tool Use
- τ²-Bench (telecom): 73.68 average score
- τ²-Bench (airline): 58.00 average score
- τ²-Bench (retail): 71.27 average score
- AceBench: 76.10% accuracy
- VitaBench: 24.30 average score
Safety Performance
- Harmful: 83.98% safety score
- Criminal: 91.24% safety score
- Misinformation: 81.72% safety score
- Privacy: 93.98% safety score
How to Use
Chat Template
LongCat-Flash uses a specific chat template format for optimal performance:
First-Turn Conversation
[Round 0] USER:{query} ASSISTANT:With system prompt:
SYSTEM:{system_prompt} [Round 0] USER:{query} ASSISTANT:Multi-Turn Conversation
SYSTEM:{system_prompt} [Round 0] USER:{query} ASSISTANT:{response}</longcat_s>... [Round N-1] USER:{query} ASSISTANT:{response}</longcat_s> [Round N] USER:{query} ASSISTANT:Tool Calling Support
LongCat-Flash supports tool calling with the following format:
{tool_description}
## Messages
SYSTEM:{system_prompt} [Round 0] USER:{query} ASSISTANT:Tool Description Format:
## Tools
You have access to the following tools:
### Tool namespace: function
#### Tool name: {func.name}
Description: {func.description}
InputSchema:
{json.dumps(func.parameters, indent=2)}
**Note**: For each function call, return a json object with function name and arguments within <longcat_tool_call></longcat_tool_call> XML tags as follows:
<longcat_tool_call>
{"name": <function-name>, "arguments": <args-dict>}
</longcat_tool_call>Quick Start Guide
- Access the Model: Available on Hugging Face
- Load the Tokenizer: Use the provided
tokenizer_config.json - Format Input: Follow the chat template format
- Generate Responses: The model will generate contextually appropriate responses
- Tool Integration: Use the tool calling format for function execution
FAQs
Q1: What makes LongCat-Flash-Chat different from other large language models? LongCat-Flash-Chat uses a unique Mixture-of-Experts architecture with dynamic computation, activating only 18.6B-31.3B parameters based on context, making it more efficient than traditional models.
Q2: How does the dynamic computation mechanism work? The model uses zero-computation experts and a PID-controller to dynamically allocate computation budget to important tokens, maintaining an average of ~27 billion activated parameters per token.
Q3: What is the inference speed of LongCat-Flash-Chat? The model achieves over 100 tokens per second (TPS) for cost-effective inference, thanks to its shortcut-connected MoE architecture.
Q4: How does LongCat-Flash-Chat perform in agentic tasks? The model shows exceptional performance in agentic tasks, with high scores in τ²-Bench evaluations and tool use scenarios, making it ideal for complex reasoning and environmental interaction.
Q5: What safety measures are implemented? LongCat-Flash-Chat includes comprehensive safety evaluation with high scores in harmful content detection (83.98%), criminal activity prevention (91.24%), and misinformation handling (81.72%).
Q6: How can I integrate LongCat-Flash-Chat into my applications? The model is available on Hugging Face and supports standard chat templates, tool calling, and multi-turn conversations. Follow the provided chat template format for optimal results.
Q7: What is the context length supported by LongCat-Flash-Chat? The model supports up to 128k tokens context length, making it suitable for long-form conversations and complex reasoning tasks.
Q8: How does the multi-stage training pipeline work? The training involves base model construction with reasoning-intensive data, mid-training enhancement for coding capabilities, and multi-stage post-training with multi-agent synthesis for agentic behaviors.
Final Thoughts
LongCat-Flash-Chat represents a significant advancement in large language model technology, combining massive scale (560B parameters) with intelligent efficiency through dynamic computation. Its exceptional performance in agentic tasks, mathematical reasoning, and coding makes it a powerful tool for complex AI applications.
The model's innovative architecture, comprehensive safety features, and competitive benchmark results position it as a leading solution for advanced language understanding and generation tasks. With its availability on Hugging Face and open licensing, LongCat-Flash-Chat opens new possibilities for researchers and developers working on AI applications.
Related Posts

ChatGPT Atlas by OpenAI Enters the Browser Wars
Chrome dominates, Edge has Copilot, and Perplexity is building Comet—now OpenAI’s ChatGPT Atlas joins in. What this AI-first browser could mean for the web.

Beyond ChatGPT: DeepAgent, the AI Agent That Works While You Sleep
Discover DeepAgent, the autonomous AI that handles your job overnight. See why tech insiders say it’s beyond ChatGPT and Claude—and how it’s working today.

DeepSeek-OCR (VL2): How to Run Locally for Complex Documents
Discover DeepSeek-OCR (VL2), a vision-language OCR you can run locally for complex documents: layout, tables, charts, and visual Q&A. Learn setup steps and tips.
