LongCat Chat: 560B Parameter Open Weight Model

Table Of Content
- LongCat Chat: Revolutionary 560B Parameter Open Weight Model
- What is LongCat Chat?
- Core Architecture
- Key Innovations
- Table Overview
- Key Features
- Scalable Architectural Design for Computational Efficiency
- Effective Model Scaling Strategy
- Multi-Stage Training Pipeline for Agentic Capability
- Evaluation Results
- General Domains
- Instruction Following
- Mathematical Reasoning
- Coding Capabilities
- Agentic Tool Use
- Safety Performance
- How to Use
- Chat Template
- First-Turn Conversation
- Multi-Turn Conversation
- Tool Calling Support
- Messages
- Tools
- Tool namespace: function
- Tool name: {func.name}
- Quick Start Guide
- FAQs
- Final Thoughts
LongCat Chat: Revolutionary 560B Parameter Open Weight Model
We introduce LongCat-Flash-Chat, a powerful and efficient language model with 560 billion total parameters, featuring an innovative Mixture-of-Experts (MoE) architecture. The model incorporates a dynamic computation mechanism that activates 18.6B∼31.3B parameters (averaging ∼27B) based on contextual demands, optimizing both computational efficiency and performance.
To achieve advanced training and inference efficiency, we employ a shortcut-connected architecture that expands computation-communication overlap window, achieving over 100 tokens per second (TPS) for inference cost-effectively. Our comprehensive training and scaling strategies ensure stable, efficient training, while tailored data strategies enhance model performance.
Now we release LongCat-Flash-Chat, a non-thinking foundation model that delivers highly competitive performance among leading models, with exceptional strengths in agentic tasks.
What is LongCat Chat?
LongCat-Flash-Chat is a state-of-the-art language model developed with a focus on efficiency and performance. It represents a breakthrough in large-scale language modeling through its innovative architecture and training methodologies.
Core Architecture
- Total Parameters: 560 billion parameters
- Activated Parameters: 18.6B to 31.3B (average ~27B) based on context
- Architecture Type: Mixture-of-Experts (MoE)
- Context Length: 128k tokens
- Inference Speed: Over 100 tokens per second
Key Innovations
- Dynamic Computation: Zero-computation experts mechanism for efficient parameter activation
- Shortcut-connected MoE (ScMoE): Expands computation-communication overlap
- Multi-stage Training: Advanced agentic capabilities through specialized training pipeline
Table Overview
Here's a comprehensive overview of LongCat-Flash-Chat specifications:
Feature | Details |
---|---|
Model Name | LongCat-Flash-Chat |
Total Parameters | 560 Billion |
Activated Parameters | 18.6B - 31.3B (avg. ~27B) |
Architecture | Mixture-of-Experts (MoE) |
Context Length | 128k tokens |
Inference Speed | 100+ TPS |
Training Scale | Tens of thousands of accelerators |
License | Available on Hugging Face |
Specialization | Agentic tasks, reasoning, coding |
Safety Features | Comprehensive safety evaluation |
Key Features
Scalable Architectural Design for Computational Efficiency
LongCat-Flash is designed and optimized under two key principles: efficient computation utilization and efficient training and inference.
Dynamic Computation Budget:
- Introduces zero-computation experts mechanism in MoE blocks
- Allocates dynamic computation budget to important tokens based on significance
- Activates 18.6 to 31.3 billion parameters based on contextual demands
- Expert bias adjusted by PID-controller maintains average of ~27 billion activated parameters per token
Communication Optimization:
- Shortcut-connected MoE (ScMoE) design expands computation-communication overlap window
- Customized infrastructure optimizations enable massive scale training
- Supports training on tens of thousands of accelerators
- High throughput and low latency inference capabilities
Effective Model Scaling Strategy
Comprehensive Stability-and-Scaling Framework:
- Hyperparameter Transfer Strategy: Successfully applies to large models by leveraging smaller proxy models with theoretical guarantees
- Model-Growth Mechanism: Initializes using refined half-scale checkpoint for improved performance
- Multi-Pronged Stability Suite:
- Principled router-gradient balancing
- Hidden z-loss to suppress massive activations
- Fine-tuned optimizer configurations
- Deterministic Computation: Guarantees exact reproducibility and enables SDC (Silent Data Corruption) detection
Multi-Stage Training Pipeline for Agentic Capability
Advanced Training Methodology:
- Base Model Construction: Two-stage pretraining data fusion strategy for reasoning-intensive domain data
- Mid-Training Enhancement: Reasoning and coding capabilities with 128k context length
- Multi-Stage Post-Training: Multi-agent synthesis framework defining task difficulty across three axes:
- Information processing
- Tool-set complexity
- User interaction
Evaluation Results
LongCat-Flash-Chat demonstrates exceptional performance across various benchmarks:
General Domains
- MMLU: 89.71% accuracy
- MMLU-Pro: 82.68% accuracy
- ArenaHard-V2: 86.50% accuracy
- CEval: 90.44% accuracy
- CMMLU: 84.34% accuracy
Instruction Following
- IFEval: 89.65% accuracy
- COLLIE: 57.10% accuracy
- Meeseeks-zh: 43.03% accuracy
Mathematical Reasoning
- MATH500: 96.40% accuracy
- AIME24: 70.42 average score
- AIME25: 61.25 average score
- BeyondAIME: 43.00 average score
Coding Capabilities
- LiveCodeBench: 48.02% pass@1
- Humaneval+: 88.41% pass@1
- MBPP+: 79.63% pass@1
- SWE-Bench-Verified: 60.40% accuracy
- TerminalBench: 39.51% accuracy
Agentic Tool Use
- τ²-Bench (telecom): 73.68 average score
- τ²-Bench (airline): 58.00 average score
- τ²-Bench (retail): 71.27 average score
- AceBench: 76.10% accuracy
- VitaBench: 24.30 average score
Safety Performance
- Harmful: 83.98% safety score
- Criminal: 91.24% safety score
- Misinformation: 81.72% safety score
- Privacy: 93.98% safety score
How to Use
Chat Template
LongCat-Flash uses a specific chat template format for optimal performance:
First-Turn Conversation
[Round 0] USER:{query} ASSISTANT:
With system prompt:
SYSTEM:{system_prompt} [Round 0] USER:{query} ASSISTANT:
Multi-Turn Conversation
SYSTEM:{system_prompt} [Round 0] USER:{query} ASSISTANT:{response}</longcat_s>... [Round N-1] USER:{query} ASSISTANT:{response}</longcat_s> [Round N] USER:{query} ASSISTANT:
Tool Calling Support
LongCat-Flash supports tool calling with the following format:
{tool_description}
## Messages
SYSTEM:{system_prompt} [Round 0] USER:{query} ASSISTANT:
Tool Description Format:
## Tools
You have access to the following tools:
### Tool namespace: function
#### Tool name: {func.name}
Description: {func.description}
InputSchema:
{json.dumps(func.parameters, indent=2)}
**Note**: For each function call, return a json object with function name and arguments within <longcat_tool_call></longcat_tool_call> XML tags as follows:
<longcat_tool_call>
{"name": <function-name>, "arguments": <args-dict>}
</longcat_tool_call>
Quick Start Guide
- Access the Model: Available on Hugging Face
- Load the Tokenizer: Use the provided
tokenizer_config.json
- Format Input: Follow the chat template format
- Generate Responses: The model will generate contextually appropriate responses
- Tool Integration: Use the tool calling format for function execution
FAQs
Q1: What makes LongCat-Flash-Chat different from other large language models? LongCat-Flash-Chat uses a unique Mixture-of-Experts architecture with dynamic computation, activating only 18.6B-31.3B parameters based on context, making it more efficient than traditional models.
Q2: How does the dynamic computation mechanism work? The model uses zero-computation experts and a PID-controller to dynamically allocate computation budget to important tokens, maintaining an average of ~27 billion activated parameters per token.
Q3: What is the inference speed of LongCat-Flash-Chat? The model achieves over 100 tokens per second (TPS) for cost-effective inference, thanks to its shortcut-connected MoE architecture.
Q4: How does LongCat-Flash-Chat perform in agentic tasks? The model shows exceptional performance in agentic tasks, with high scores in τ²-Bench evaluations and tool use scenarios, making it ideal for complex reasoning and environmental interaction.
Q5: What safety measures are implemented? LongCat-Flash-Chat includes comprehensive safety evaluation with high scores in harmful content detection (83.98%), criminal activity prevention (91.24%), and misinformation handling (81.72%).
Q6: How can I integrate LongCat-Flash-Chat into my applications? The model is available on Hugging Face and supports standard chat templates, tool calling, and multi-turn conversations. Follow the provided chat template format for optimal results.
Q7: What is the context length supported by LongCat-Flash-Chat? The model supports up to 128k tokens context length, making it suitable for long-form conversations and complex reasoning tasks.
Q8: How does the multi-stage training pipeline work? The training involves base model construction with reasoning-intensive data, mid-training enhancement for coding capabilities, and multi-stage post-training with multi-agent synthesis for agentic behaviors.
Final Thoughts
LongCat-Flash-Chat represents a significant advancement in large language model technology, combining massive scale (560B parameters) with intelligent efficiency through dynamic computation. Its exceptional performance in agentic tasks, mathematical reasoning, and coding makes it a powerful tool for complex AI applications.
The model's innovative architecture, comprehensive safety features, and competitive benchmark results position it as a leading solution for advanced language understanding and generation tasks. With its availability on Hugging Face and open licensing, LongCat-Flash-Chat opens new possibilities for researchers and developers working on AI applications.
Related Posts

3DTrajMaster: A Step-by-Step Guide to Video Motion Control
Browser Use is an AI-powered browser automation framework that lets AI agents control your browser to automate web tasks like scraping, form filling, and website interactions.

Bokeh Diffusion: Defocus Blur Control in Text-to-Image Diffusion Models
Bokeh Diffusion is a text-to-image AI model that provides precise control over background blur, known as bokeh, in generated images, using a defocus parameter to maintain scene consistency.

Browser-Use Free AI Agent: Now AI Can control your Web Browser
Browser Use is an AI-powered browser automation framework that lets AI agents control your browser to automate web tasks like scraping, form filling, and website interactions.