LongCat Chat: Revolutionary 560B Parameter Open Weight Model

We introduce LongCat-Flash-Chat, a powerful and efficient language model with 560 billion total parameters, featuring an innovative Mixture-of-Experts (MoE) architecture. The model incorporates a dynamic computation mechanism that activates 18.6B∼31.3B parameters (averaging ∼27B) based on contextual demands, optimizing both computational efficiency and performance.

To achieve advanced training and inference efficiency, we employ a shortcut-connected architecture that expands computation-communication overlap window, achieving over 100 tokens per second (TPS) for inference cost-effectively. Our comprehensive training and scaling strategies ensure stable, efficient training, while tailored data strategies enhance model performance.

Now we release LongCat-Flash-Chat, a non-thinking foundation model that delivers highly competitive performance among leading models, with exceptional strengths in agentic tasks.

What is LongCat Chat?

LongCat-Flash-Chat is a state-of-the-art language model developed with a focus on efficiency and performance. It represents a breakthrough in large-scale language modeling through its innovative architecture and training methodologies.

Core Architecture

Total Parameters: 560 billion parameters
Activated Parameters: 18.6B to 31.3B (average ~27B) based on context
Architecture Type: Mixture-of-Experts (MoE)
Context Length: 128k tokens
Inference Speed: Over 100 tokens per second

Key Innovations

Dynamic Computation: Zero-computation experts mechanism for efficient parameter activation
Shortcut-connected MoE (ScMoE): Expands computation-communication overlap
Multi-stage Training: Advanced agentic capabilities through specialized training pipeline

Table Overview

Here's a comprehensive overview of LongCat-Flash-Chat specifications:

Feature	Details
Model Name	LongCat-Flash-Chat
Total Parameters	560 Billion
Activated Parameters	18.6B - 31.3B (avg. ~27B)
Architecture	Mixture-of-Experts (MoE)
Context Length	128k tokens
Inference Speed	100+ TPS
Training Scale	Tens of thousands of accelerators
License	Available on Hugging Face
Specialization	Agentic tasks, reasoning, coding
Safety Features	Comprehensive safety evaluation

Key Features

Scalable Architectural Design for Computational Efficiency

LongCat-Flash is designed and optimized under two key principles: efficient computation utilization and efficient training and inference.

Dynamic Computation Budget:

Introduces zero-computation experts mechanism in MoE blocks
Allocates dynamic computation budget to important tokens based on significance
Activates 18.6 to 31.3 billion parameters based on contextual demands
Expert bias adjusted by PID-controller maintains average of ~27 billion activated parameters per token

Communication Optimization:

Shortcut-connected MoE (ScMoE) design expands computation-communication overlap window
Customized infrastructure optimizations enable massive scale training
Supports training on tens of thousands of accelerators
High throughput and low latency inference capabilities

Effective Model Scaling Strategy

Comprehensive Stability-and-Scaling Framework:

Hyperparameter Transfer Strategy: Successfully applies to large models by leveraging smaller proxy models with theoretical guarantees
Model-Growth Mechanism: Initializes using refined half-scale checkpoint for improved performance
Multi-Pronged Stability Suite:
- Principled router-gradient balancing
- Hidden z-loss to suppress massive activations
- Fine-tuned optimizer configurations
Deterministic Computation: Guarantees exact reproducibility and enables SDC (Silent Data Corruption) detection

Multi-Stage Training Pipeline for Agentic Capability

Advanced Training Methodology:

Base Model Construction: Two-stage pretraining data fusion strategy for reasoning-intensive domain data
Mid-Training Enhancement: Reasoning and coding capabilities with 128k context length
Multi-Stage Post-Training: Multi-agent synthesis framework defining task difficulty across three axes:
- Information processing
- Tool-set complexity
- User interaction

Evaluation Results

LongCat-Flash-Chat demonstrates exceptional performance across various benchmarks:

General Domains

MMLU: 89.71% accuracy
MMLU-Pro: 82.68% accuracy
ArenaHard-V2: 86.50% accuracy
CEval: 90.44% accuracy
CMMLU: 84.34% accuracy

Instruction Following

IFEval: 89.65% accuracy
COLLIE: 57.10% accuracy
Meeseeks-zh: 43.03% accuracy

Mathematical Reasoning

MATH500: 96.40% accuracy
AIME24: 70.42 average score
AIME25: 61.25 average score
BeyondAIME: 43.00 average score

Coding Capabilities

LiveCodeBench: 48.02% pass@1
Humaneval+: 88.41% pass@1
MBPP+: 79.63% pass@1
SWE-Bench-Verified: 60.40% accuracy
TerminalBench: 39.51% accuracy

Agentic Tool Use

τ²-Bench (telecom): 73.68 average score
τ²-Bench (airline): 58.00 average score
τ²-Bench (retail): 71.27 average score
AceBench: 76.10% accuracy
VitaBench: 24.30 average score

Safety Performance

Harmful: 83.98% safety score
Criminal: 91.24% safety score
Misinformation: 81.72% safety score
Privacy: 93.98% safety score

How to Use

Chat Template

LongCat-Flash uses a specific chat template format for optimal performance:

First-Turn Conversation

[Round 0] USER:{query} ASSISTANT:

With system prompt:

SYSTEM:{system_prompt} [Round 0] USER:{query} ASSISTANT:

Multi-Turn Conversation

SYSTEM:{system_prompt} [Round 0] USER:{query} ASSISTANT:{response}</longcat_s>... [Round N-1] USER:{query} ASSISTANT:{response}</longcat_s> [Round N] USER:{query} ASSISTANT:

Tool Calling Support

LongCat-Flash supports tool calling with the following format:

{tool_description}

## Messages
SYSTEM:{system_prompt} [Round 0] USER:{query} ASSISTANT:

Tool Description Format:

## Tools
You have access to the following tools: 

### Tool namespace: function

#### Tool name: {func.name}

Description: {func.description}

InputSchema: 
{json.dumps(func.parameters, indent=2)}

**Note**: For each function call, return a json object with function name and arguments within <longcat_tool_call></longcat_tool_call> XML tags as follows:
<longcat_tool_call>
{"name": <function-name>, "arguments": <args-dict>}
</longcat_tool_call>

Quick Start Guide

Access the Model: Available on Hugging Face
Load the Tokenizer: Use the provided tokenizer_config.json
Format Input: Follow the chat template format
Generate Responses: The model will generate contextually appropriate responses
Tool Integration: Use the tool calling format for function execution

FAQs

Q1: What makes LongCat-Flash-Chat different from other large language models? LongCat-Flash-Chat uses a unique Mixture-of-Experts architecture with dynamic computation, activating only 18.6B-31.3B parameters based on context, making it more efficient than traditional models.

Q2: How does the dynamic computation mechanism work? The model uses zero-computation experts and a PID-controller to dynamically allocate computation budget to important tokens, maintaining an average of ~27 billion activated parameters per token.

Q3: What is the inference speed of LongCat-Flash-Chat? The model achieves over 100 tokens per second (TPS) for cost-effective inference, thanks to its shortcut-connected MoE architecture.

Q4: How does LongCat-Flash-Chat perform in agentic tasks? The model shows exceptional performance in agentic tasks, with high scores in τ²-Bench evaluations and tool use scenarios, making it ideal for complex reasoning and environmental interaction.

Q5: What safety measures are implemented? LongCat-Flash-Chat includes comprehensive safety evaluation with high scores in harmful content detection (83.98%), criminal activity prevention (91.24%), and misinformation handling (81.72%).

Q6: How can I integrate LongCat-Flash-Chat into my applications? The model is available on Hugging Face and supports standard chat templates, tool calling, and multi-turn conversations. Follow the provided chat template format for optimal results.

Q7: What is the context length supported by LongCat-Flash-Chat? The model supports up to 128k tokens context length, making it suitable for long-form conversations and complex reasoning tasks.

Q8: How does the multi-stage training pipeline work? The training involves base model construction with reasoning-intensive data, mid-training enhancement for coding capabilities, and multi-stage post-training with multi-agent synthesis for agentic behaviors.

Final Thoughts

LongCat-Flash-Chat represents a significant advancement in large language model technology, combining massive scale (560B parameters) with intelligent efficiency through dynamic computation. Its exceptional performance in agentic tasks, mathematical reasoning, and coding makes it a powerful tool for complex AI applications.

The model's innovative architecture, comprehensive safety features, and competitive benchmark results position it as a leading solution for advanced language understanding and generation tasks. With its availability on Hugging Face and open licensing, LongCat-Flash-Chat opens new possibilities for researchers and developers working on AI applications.

LongCat Chat: 560B Parameter Open Weight Model