Add ep #39501

ArthurZucker · 2025-07-18T13:41:08Z

What does this PR do?

Add support for expert parallel!

Co-authored-by: Nouamane Tazi <[email protected]> Co-authored-by: drbh <[email protected]>

ArthurZucker · 2025-07-18T14:57:23Z

Does not work for mixtral, tricky because of the sequentiality of the weights.
I would need to tap into fusing them on the fly.

I'll add something that allows for this, but also a plan that allows for one the fly merging the modulelist to the format for megablocks

winglian · 2025-07-22T08:13:53Z

Will getting deepseek working too be pretty straightforward?

ArthurZucker · 2025-07-22T08:59:37Z

Yes and no! for both I used nn.Modulelist() (deepseek I had no choice and Mixtral I was junior) and so it's a bit more annoying, but yes because next pr will make sur we have a bit of a better interface!

HuggingFaceDocBuilderDev · 2025-07-24T10:15:32Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

src/transformers/models/llama4/modeling_llama4.py

src/transformers/integrations/tensor_parallel.py

github-actions · 2025-07-25T17:25:59Z

[For maintainers] Suggested jobs to run (before merge)

run-slow: llama4

For testing target_parameters, we use a tiny Llama4 model. This model was refactored in huggingface/transformers#39501, resulting in one parameter being accessed an additional time: https://github.com/huggingface/transformers/pull/39501/files#diff-e668ec07f78afdb2cb805d939e47453757f0b9437436cb860fcb7cb2431c9cf5R69 Therefore, a unit test that relied on how often this parameter was accessed started failing. This PR updates the count to the correct number. Additionally debug print statements that were accidentally left over are now removed.

Llama4 accuracy is broken by a bug in huggingface#39501 . It forgot to transpose the router_scores before applying it to routed_in, causing Llama4 to generate garbage output. This PR fixes that issue by adding back the transpose() and adding some comments explaining why the transpose() is needed. Signed-off-by: Po-Han Huang <[email protected]>

* Fix broken Llama4 accuracy in MoE part Llama4 accuracy is broken by a bug in #39501 . It forgot to transpose the router_scores before applying it to routed_in, causing Llama4 to generate garbage output. This PR fixes that issue by adding back the transpose() and adding some comments explaining why the transpose() is needed. Signed-off-by: Po-Han Huang <[email protected]> * remove comment --------- Signed-off-by: Po-Han Huang <[email protected]> Co-authored-by: Cyril Vallez <[email protected]>

* EP + updates Co-authored-by: Nouamane Tazi <[email protected]> Co-authored-by: drbh <[email protected]> * remove unrelated change * not working yet but let's see where it goes! * update the api a bit * udpate * where I am at for now * fix ep * refactor the API * yups * fix * fixup * clean modeling * just support llama4 for now! * properly avoid * fix * nits * Update src/transformers/models/llama4/modeling_llama4.py * Update src/transformers/integrations/tensor_parallel.py * style * ,,,, * update --------- Co-authored-by: Nouamane Tazi <[email protected]> Co-authored-by: drbh <[email protected]>

ArthurZucker and others added 2 commits July 17, 2025 11:55

EP + updates

e564d5a

Co-authored-by: Nouamane Tazi <[email protected]> Co-authored-by: drbh <[email protected]>

remove unrelated change

24e322d

ArthurZucker mentioned this pull request Jul 18, 2025

Kernels flash attn #39474

Merged

vasqu mentioned this pull request Jul 21, 2025

[Ernie 4.5] Add ernie text models #39228

Merged

25 tasks

ArthurZucker added 5 commits July 22, 2025 15:33

not working yet but let's see where it goes!

4f25da6

update the api a bit

eb6f8fb

udpate

30dd08b

Merge branch 'main' of github.com:huggingface/transformers into add-ep

76c5efd

where I am at for now

9731791

ArthurZucker and others added 13 commits July 24, 2025 10:53

fix ep

b4f1c74

refactor the API

7384875

yups

6accafc

fix

ea0f887

Merge branch 'main' of github.com:huggingface/transformers into add-ep

c1b7b69

fixup

3646135

clean modeling

1fc0690

Merge branch 'main' into add-ep

04e4222

Merge branch 'main' into add-ep

dcee4b4

just support llama4 for now!

8657c2c

properly avoid

bf0d8e5

fix

26611bf

nits

25cdcd2

ArthurZucker commented Jul 25, 2025

View reviewed changes

src/transformers/models/llama4/modeling_llama4.py Outdated Show resolved Hide resolved

src/transformers/integrations/tensor_parallel.py Outdated Show resolved Hide resolved

ArthurZucker and others added 3 commits July 25, 2025 18:44

Update src/transformers/models/llama4/modeling_llama4.py

e1d72ee

Update src/transformers/integrations/tensor_parallel.py

46a805b

style

0761c77

ArthurZucker added 2 commits July 25, 2025 19:09

,,,,

c49c217

update

9b4bd96

ArthurZucker merged commit 300d42a into main Jul 25, 2025
24 of 26 checks passed

ArthurZucker deleted the add-ep branch July 25, 2025 17:46

winglian mentioned this pull request Jul 26, 2025

fix missing model._tp_size from ep refactor #39688

Merged

5 tasks

S1ro1 mentioned this pull request Jul 26, 2025

PATCH: add back n-dim device-mesh + fix tp trainer saving #39693

Merged

BenjaminBossan mentioned this pull request Jul 28, 2025

FIX Failing target_parameters param usage count huggingface/peft#2676

Merged

rishub-tamirisa mentioned this pull request Aug 6, 2025

Breaking change in unset _tp_plan attribute #39943

Closed

nvpohanh mentioned this pull request Sep 2, 2025

Fix broken Llama4 accuracy in MoE part #40609

Merged

5 tasks

NanoCode012 mentioned this pull request Sep 12, 2025

Integration of fused moe kernel (e.g., megablocks) for efficient moe training axolotl-ai-cloud/axolotl#3155

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add ep #39501

Add ep #39501

Uh oh!

ArthurZucker commented Jul 18, 2025 •

edited

Loading

Uh oh!

ArthurZucker commented Jul 18, 2025 •

edited

Loading

Uh oh!

winglian commented Jul 22, 2025

Uh oh!

ArthurZucker commented Jul 22, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jul 24, 2025

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jul 25, 2025

Uh oh!

Uh oh!

Uh oh!

Add ep #39501

Add ep #39501

Uh oh!

Conversation

ArthurZucker commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

ArthurZucker commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

winglian commented Jul 22, 2025

Uh oh!

ArthurZucker commented Jul 22, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Jul 24, 2025

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Jul 25, 2025

Uh oh!

Uh oh!

Uh oh!

ArthurZucker commented Jul 18, 2025 •

edited

Loading

ArthurZucker commented Jul 18, 2025 •

edited

Loading