Add LongCat-Flash #40730

molbap · 2025-09-05T16:24:19Z

What does this PR do?

As per title, adds support for LongCat-Flash, a 560B MoE from Meituan.

Status:

Current modeling_longcat_flash file allows loading checkpoint without trust_remote_code, using a specific base_model_tp_plan found in the config. `from_pretrained('..., tp_plan='auto') loads the model properly.
Chat template is as provided by authors.
A no-op hook added to deepseek_v3 to abstract lora scaling.
Testing out generations and correctness. All work.
A few modular adjustments to make to derive from DeepSeekv3, estimate ~300 loc total.
Quality & last touches, adding a new checkpoint to maximize compatibility with transformers
Make CI happy # DOING

Launch snippet:

# launch_longcat.py
from transformers import LongcatFlashForCausalLM, AutoTokenizer
import torch

torch.manual_seed(30)
model_id = "meituan-longcat/LongCat-Flash-Chat"

tokenizer = AutoTokenizer.from_pretrained(model_id)

chat = [
      {"role": "user", "content": "Hello! What is the capital of France? What can you tell me about it?"},
]

model = LongcatFlashForCausalLM.from_pretrained(
      model_id,
      tp_plan="auto",
      dtype=torch.bfloat16,
      trust_remote_code=False, # can be removed.
      )

inputs = tokenizer.apply_chat_template(
      chat, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)

outputs = model.generate(inputs, max_new_tokens=30)
print(tokenizer.batch_decode(outputs))

Note that you will need at least 2x8 H100 to launch the model with TP as follows

torchrun  --nproc_per_node=8 --nnodes=2 --node_rank=0 | 1  --rdzv-id <an_id> --rdzv-backend c10d --rdzv-endpoint $NODE_ID:$NODE_PORT  --log-dir ./logs_longcat launch_longcat.py

And you'll get

[Round 0] USER:Hello! What is the capital of France? What can you tell me about it? ASSISTANT:Hello! 😊 The capital of France is Paris, one of the most famous and beloved cities in the world. Here’s a quick overview of what makes Paris special:

1. Iconic Landmarks

Eiffel Tower – The global symbol of France, built in 1889 for the World's Fair.
Notre-Dame Cathedral – A masterpiece of Gothic architecture (currently under restoration after the 2019 fire).
Louvre Museum – The world’s largest art museum, home to the Mona Lisa and Venus de Milo.
Sacré-Cœur Basilica – A stunning white church atop Montmartre with panoramic views.
Arc de Triomphe – Honors French military victories, with the Tomb of the Unknown Soldier beneath it.
Champs-Élysées – A glamorous avenue leading to the Arc de Triomphe, lined with shops and cafés.

2. Culture & Arts

Paris is the "City of Light" (La Ville Lumière), a nickname from its early adoption of street lighting and its role as a center of enlightenment.
It’s a global hub for fashion (haute couture, Paris Fashion Week) and art (Impressionism, Picasso, Dali).
Famous literary figures like Hemingway, Fitzgerald, and Sartre lived and wrote here.

3. Food & Cuisine

Croissants, baguettes, macarons, and crème brûlée are just a few of its culinary delights.
Paris has over 100 Michelin-starred restaurants and countless cozy bistros.
The Marché d’Aligre and Rue Mouffetard are great for fresh produce and local flavors.

4. History & Politics

Founded in the 3rd century BC by the Parisii tribe, it became a major European city under the Romans.
The French Revolution (1789–1799) began here, leading to the fall of the monarchy.
Today, it’s the political and economic heart of France, housing the French President’s residence (Élysée Palace) and the National Assembly.

**

HuggingFaceDocBuilderDev · 2025-09-05T16:33:22Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Nice! 🚀

src/transformers/models/longcat_flash/modular_longcat_flash.py

tests/models/longcat_flash/test_modeling_longcat_flash.py

src/transformers/models/longcat_flash/configuration_longcat_flash.py

tests/models/longcat_flash/test_modeling_longcat_flash.py

…w_moe

molbap · 2025-09-12T16:13:09Z

Added:

A small test on a cutdown version of the model. Output is nonsensical (I took a few experts and layers) but consistent.
A larger test on the full model, to be run on CPU, it's super slow but useful as a reference both to us and the community. This PR's description also contains what's expected at inference.
Isolated a bit the modular code to not have side effects.

github-actions · 2025-09-12T16:34:35Z

This comment contains run-slow, running the specified jobs:

models: ['models/longcat_flash']
quantizations: [] ...

molbap · 2025-09-12T16:34:39Z

[run-slow] longcat_flash

src/transformers/models/longcat_flash/modular_longcat_flash.py

ArthurZucker · 2025-09-12T16:48:40Z

src/transformers/models/longcat_flash/modular_longcat_flash.py

+        # Initialize weights and apply final processing
+        self.post_init()
+
+    def forward(


doo you have to overwrite this one?

yes... because of num_hidden_layers vs num_layers (the actual number of layers being 56, there's 28 in the config under the num_layers name and we iterate over num_hidden_layers if I don't overwrite

tests/models/longcat_flash/test_modeling_longcat_flash.py

…w_moe

github-actions · 2025-09-12T17:28:12Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, longcat_flash

molbap · 2025-09-12T18:01:58Z

[run-slow] longcat_flash

molbap added 2 commits September 5, 2025 16:00

working draft for LongCat

21ac639

BC changes to deepseek_v3 for modular

c939eb2

molbap added 22 commits September 8, 2025 07:44

format

2535c28

Merge branch 'main' into new_moe

bac973f

various modularities

cddaba5

better tp plan

67943a4

better init

d765b18

minor changes

eebb41c

make modular better

414ba61

clean up patterns

7586dd7

Revert a couple of modular commits, because we won't convert in the end

b4584ad

make things explicit.

76e4555

draft test

c7c5a3d

toctree, tests and imports

6e58487

drop

8bb172d

woops

726828d

make better things

df11c0e

update test

fa3aacf

update

07af563

fixes

927a55e

style and CI

36c3dbb

convert stuff

d85c3e3

up

8cb4dc2

ah, yes, that

1343b65

molbap marked this pull request as ready for review September 9, 2025 16:58

molbap added 3 commits September 10, 2025 10:58

enable gen tests

275374a

fix cache shape in test (sum of 2 things)

f9d35c5

fix tests

74d2728

molbap requested a review from ArthurZucker September 10, 2025 15:45

molbap added 7 commits September 10, 2025 18:26

comments

1c9b49f

re-Identitise

967259a

minimize changes

da61426

better defaults

9ff6f95

modular betterment

d75311c

fix configuration, add documentation

87b5687

fix init

e39779d

ArthurZucker approved these changes Sep 11, 2025

View reviewed changes

src/transformers/models/longcat_flash/modular_longcat_flash.py Outdated Show resolved Hide resolved

tests/models/longcat_flash/test_modeling_longcat_flash.py Show resolved Hide resolved

src/transformers/models/longcat_flash/configuration_longcat_flash.py Show resolved Hide resolved

molbap added 3 commits September 12, 2025 14:41

add integration tests

c85a7ea

add info

3846289

simplify

1ec96f4

ArthurZucker reviewed Sep 12, 2025

View reviewed changes

tests/models/longcat_flash/test_modeling_longcat_flash.py Outdated Show resolved Hide resolved

molbap and others added 9 commits September 12, 2025 14:21

update slow tests

6778512

fix

88e3114

conflicted

563f9e0

style

67fd0d1

Merge branch 'main' into new_moe

ae5fcbc

Merge branch 'new_moe' of github.com:huggingface/transformers into ne…

c85afdd

…w_moe

some additional long tests

f208aa4

cpu-only long test

a3be847

Merge branch 'main' into new_moe

cf09a0b

ArthurZucker approved these changes Sep 12, 2025

View reviewed changes

molbap added 2 commits September 12, 2025 19:26

fix last tests?

c0f965f

Merge branch 'new_moe' of github.com:huggingface/transformers into ne…

2a76079

…w_moe

urg

7dafc04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add LongCat-Flash #40730

Add LongCat-Flash #40730

molbap commented Sep 5, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Sep 5, 2025

Uh oh!

ArthurZucker left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

molbap commented Sep 12, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 12, 2025

Uh oh!

molbap commented Sep 12, 2025

Uh oh!

Uh oh!

ArthurZucker Sep 12, 2025

Uh oh!

molbap Sep 12, 2025

Uh oh!

Uh oh!

github-actions bot commented Sep 12, 2025

Uh oh!

molbap commented Sep 12, 2025

Uh oh!

Uh oh!

Add LongCat-Flash #40730

Are you sure you want to change the base?

Add LongCat-Flash #40730

Conversation

molbap commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

1. Iconic Landmarks

2. Culture & Arts

3. Food & Cuisine

4. History & Politics

**

Uh oh!

HuggingFaceDocBuilderDev commented Sep 5, 2025

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

molbap commented Sep 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Sep 12, 2025

Uh oh!

molbap commented Sep 12, 2025

Uh oh!

Uh oh!

ArthurZucker Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

molbap Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Sep 12, 2025

Uh oh!

molbap commented Sep 12, 2025

Uh oh!

Uh oh!

molbap commented Sep 5, 2025 •

edited

Loading

molbap commented Sep 12, 2025 •

edited

Loading