-
-
Notifications
You must be signed in to change notification settings - Fork 10.4k
[Model] Support Tele-FLM Model #15023
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
5b956eb
to
5f7f267
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that here is the only difference with Llama, can you refactor the model implementation to reduce duplicate code just like glm.py
and telechat2.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is really necessary to port this config? I think custom config from hf dynamic module can work in most of cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: The supported_lora_modules
var has been deprecated, plz remove it
Hi, remember to update the documentation, see: https://github.com/vllm-project/vllm/blob/main/docs/source/models/supported_models.md |
Signed-off-by: Naitong Yu <[email protected]> Signed-off-by: jiangxin <[email protected]>
Signed-off-by: Naitong Yu <[email protected]> Signed-off-by: jiangxin <[email protected]>
Signed-off-by: Naitong Yu <[email protected]> Signed-off-by: jiangxin <[email protected]>
Signed-off-by: Naitong Yu <[email protected]> Signed-off-by: jiangxin <[email protected]>
Signed-off-by: Naitong Yu <[email protected]> Signed-off-by: jiangxin <[email protected]>
Signed-off-by: Naitong Yu <[email protected]> Signed-off-by: jiangxin <[email protected]>
Signed-off-by: Naitong Yu <[email protected]> Signed-off-by: jiangxin <[email protected]>
Signed-off-by: Naitong Yu <[email protected]> Signed-off-by: jiangxin <[email protected]>
Signed-off-by: jiangxin <[email protected]>
Signed-off-by: jiangxin <[email protected]>
Signed-off-by: jiangxin <[email protected]>
Signed-off-by: jiangxin <[email protected]>
Signed-off-by: jiangxin <[email protected]>
Signed-off-by: jiangxin <[email protected]>
Signed-off-by: jiangxin <[email protected]>
Don't forget updating |
Signed-off-by: jiangxin <[email protected]>
Signed-off-by: jiangxin <[email protected]>
Signed-off-by: jiangxin <[email protected]>
All the above suggestions have been adopted, and corresponding adjustments have been made:
The CI failure occurred in the Entrypoints Test, but this error appears unrelated to our code modifications. |
Signed-off-by: jiangxin <[email protected]>
Can you merge from main to fix the CI failure? |
CI failures look unrelated, merging. Thanks for your effort! |
Signed-off-by: Naitong Yu <[email protected]> Signed-off-by: jiangxin <[email protected]> Co-authored-by: Jason Fang <[email protected]> Co-authored-by: jiangxin <[email protected]>
Signed-off-by: Naitong Yu <[email protected]> Signed-off-by: jiangxin <[email protected]> Co-authored-by: Jason Fang <[email protected]> Co-authored-by: jiangxin <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>
Signed-off-by: Naitong Yu <[email protected]> Signed-off-by: jiangxin <[email protected]> Co-authored-by: Jason Fang <[email protected]> Co-authored-by: jiangxin <[email protected]>
Signed-off-by: Naitong Yu <[email protected]> Signed-off-by: jiangxin <[email protected]> Co-authored-by: Jason Fang <[email protected]> Co-authored-by: jiangxin <[email protected]>
Signed-off-by: Naitong Yu <[email protected]> Signed-off-by: jiangxin <[email protected]> Co-authored-by: Jason Fang <[email protected]> Co-authored-by: jiangxin <[email protected]> Signed-off-by: Mu Huai <[email protected]>
This PR adds support for Tele-FLM Model.
Tele-FLM (aka FLM-2) is a 52B open-sourced multilingual large language model that features a stable, efficient pre-training paradigm and enhanced factual judgement capabilities. Built upon the decoder-only transformer architecture, it has been trained on approximately 2T tokens. Tele-FLM demonstrates superior performances at its scale, and sometimes surpass larger models. In addition to sharing the model weights, we provide the core designs, engineering practices, and training details, anticipating their benefits for both academic and industrial communities.
The model collection can be found at Hugging Face (https://huggingface.co/collections/CofeAI/tele-flm-flm-2-669e4dbd2dbf53ccd2454304).
Veried that
vllm serve CofeAI/FLM-2-52B-Instruct-2407 --trust-remote-code --chat-template /examples/tool_chat_template_teleflm.jinja -tp 2
works.