Skip to content

Conversation

Sanster
Copy link
Contributor

@Sanster Sanster commented Jul 27, 2023

The main modifications are in the "load_weights" function.

Before:
image

After:
image

@LiVincent-Zhang
Copy link

Is the same reason for baichuan-13b? #530

@Sanster
Copy link
Contributor Author

Sanster commented Jul 27, 2023

Is the same reason for baichuan-13b? #530

Yes. I have tested it on both baichuan13b and 7b, and it can output normal output under tp.

@LiVincent-Zhang
Copy link

Is the same reason for baichuan-13b? #530

Yes. I have tested it on both baichuan13b and 7b, and it can output normal output under tp.

Can I use this PR directly on 13B?

Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution! Can you use our official formatting script and remove other additional format changes?

Comment on lines 282 to 294
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this part the only part that actually changes the code logic? Can you remove other format-only modifications and use format.sh script provided by us to re-format the code? Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I have already modified the content of the PR and removed the invalid format part.

@Sanster Sanster force-pushed the fix_baichuan_7b_tp branch from 356793c to aeb2d9e Compare August 1, 2023 06:25
Copy link
Member

@zhuohan123 zhuohan123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for your contribution!

@zhuohan123 zhuohan123 merged commit d4c7755 into vllm-project:main Aug 1, 2023
hongxiayang pushed a commit to hongxiayang/vllm that referenced this pull request Feb 13, 2024
amy-why-3459 pushed a commit to amy-why-3459/vllm that referenced this pull request Sep 15, 2025
…ct#598)

### What this PR does / why we need it?
Deepseek v3 now adopt vanilla chunked prefill on MLA part which is
ineffcient for computing but necessary for chunked prefill. Since PR
vllm-project/vllm-ascend#543 bring v0 scheduler
into vllm-ascend, we can now adopt torch_npu._npu_flash_attention inside
the mla backend for more performance boost. Also there are some
redundant computation inside the rope, which is also removed. This PR
should bring some performance gain for deepseek eager mode inference.

---------

Signed-off-by: ganyi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants