Skip to content

Conversation

molbap
Copy link
Contributor

@molbap molbap commented Sep 5, 2025

What does this PR do?

As per title, adds support for LongCat-Flash, a 560B MoE from Meituan.

Status:

  • Current modeling_longcat_flash file allows loading checkpoint without trust_remote_code, using a specific base_model_tp_plan found in the config. `from_pretrained('..., tp_plan='auto') loads the model properly.
  • Chat template is as provided by authors.
  • A no-op hook added to deepseek_v3 to abstract lora scaling.
  • Testing out generations and correctness. All work.
  • A few modular adjustments to make to derive from DeepSeekv3, estimate ~300 loc total.
  • Quality & last touches, adding a new checkpoint to maximize compatibility with transformers
  • Make CI happy # DOING

Launch snippet:

# launch_longcat.py
from transformers import LongcatFlashForCausalLM, AutoTokenizer
import torch

torch.manual_seed(30)
model_id = "meituan-longcat/LongCat-Flash-Chat"

tokenizer = AutoTokenizer.from_pretrained(model_id)

chat = [
      {"role": "user", "content": "Hello! What is the capital of France? What can you tell me about it?"},
]

model = LongcatFlashForCausalLM.from_pretrained(
      model_id,
      tp_plan="auto",
      dtype=torch.bfloat16,
      trust_remote_code=False, # can be removed.
      )

inputs = tokenizer.apply_chat_template(
      chat, tokenize=True, add_generation_prompt=True, return_tensors="pt").to(model.device)

outputs = model.generate(inputs, max_new_tokens=30)
print(tokenizer.batch_decode(outputs))

Note that you will need at least 2x8 H100 to launch the model with TP as follows

torchrun  --nproc_per_node=8 --nnodes=2 --node_rank=0 | 1  --rdzv-id <an_id> --rdzv-backend c10d --rdzv-endpoint $NODE_ID:$NODE_PORT  --log-dir ./logs_longcat launch_longcat.py

And you'll get

[Round 0] USER:Hello! What is the capital of France? What can you tell me about it? ASSISTANT:Hello! 😊 The capital of France is Paris, one of the most famous and beloved cities in the world. Here’s a quick overview of what makes Paris special:

1. Iconic Landmarks

  • Eiffel Tower – The global symbol of France, built in 1889 for the World's Fair.
  • Notre-Dame Cathedral – A masterpiece of Gothic architecture (currently under restoration after the 2019 fire).
  • Louvre Museum – The world’s largest art museum, home to the Mona Lisa and Venus de Milo.
  • Sacré-Cœur Basilica – A stunning white church atop Montmartre with panoramic views.
  • Arc de Triomphe – Honors French military victories, with the Tomb of the Unknown Soldier beneath it.
  • Champs-Élysées – A glamorous avenue leading to the Arc de Triomphe, lined with shops and cafés.

2. Culture & Arts

  • Paris is the "City of Light" (La Ville Lumière), a nickname from its early adoption of street lighting and its role as a center of enlightenment.
  • It’s a global hub for fashion (haute couture, Paris Fashion Week) and art (Impressionism, Picasso, Dali).
  • Famous literary figures like Hemingway, Fitzgerald, and Sartre lived and wrote here.

3. Food & Cuisine

  • Croissants, baguettes, macarons, and crème brûlée are just a few of its culinary delights.
  • Paris has over 100 Michelin-starred restaurants and countless cozy bistros.
  • The Marché d’Aligre and Rue Mouffetard are great for fresh produce and local flavors.

4. History & Politics

  • Founded in the 3rd century BC by the Parisii tribe, it became a major European city under the Romans.
  • The French Revolution (1789–1799) began here, leading to the fall of the monarchy.
  • Today, it’s the political and economic heart of France, housing the French President’s residence (Élysée Palace) and the National Assembly.

**

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@molbap molbap marked this pull request as ready for review September 9, 2025 16:58
@molbap molbap requested a review from ArthurZucker September 10, 2025 15:45
Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 🚀

@molbap
Copy link
Contributor Author

molbap commented Sep 12, 2025

Added:

  • A small test on a cutdown version of the model. Output is nonsensical (I took a few experts and layers) but consistent.
  • A larger test on the full model, to be run on CPU, it's super slow but useful as a reference both to us and the community. This PR's description also contains what's expected at inference.
  • Isolated a bit the modular code to not have side effects.

Copy link
Contributor

This comment contains run-slow, running the specified jobs:

models: ['models/longcat_flash']
quantizations: [] ...

@molbap
Copy link
Contributor Author

molbap commented Sep 12, 2025

[run-slow] longcat_flash

# Initialize weights and apply final processing
self.post_init()

def forward(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doo you have to overwrite this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes... because of num_hidden_layers vs num_layers (the actual number of layers being 56, there's 28 in the config under the num_layers name and we iterate over num_hidden_layers if I don't overwrite

Copy link
Contributor

[For maintainers] Suggested jobs to run (before merge)

run-slow: auto, longcat_flash

@molbap
Copy link
Contributor Author

molbap commented Sep 12, 2025

[run-slow] longcat_flash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants