-
-
Notifications
You must be signed in to change notification settings - Fork 10.2k
Closed
Closed
Copy link
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
The output of `python collect_env.py`
Your output of `python collect_env.py` here
Model Input Dumps
No response
🐛 Describe the bug
With glm-4 model, prefix caching is automatically disabled, because the engine is treating it as MLLM.
Maybe related to following codes:
vllm/vllm/model_executor/models/chatglm.py
Lines 758 to 782 in d427e5c
@MULTIMODAL_REGISTRY.register_image_input_mapper(mm_input_mapper_for_glmv) | |
@MULTIMODAL_REGISTRY.register_max_image_tokens(get_max_glmv_image_tokens) | |
@INPUT_REGISTRY.register_dummy_data(dummy_data_for_glmv) | |
@INPUT_REGISTRY.register_input_processor(input_processor_for_glmv) | |
class ChatGLMForCausalLM(ChatGLMBaseModel, SupportsLoRA, SupportsPP, | |
SupportsMultiModal): | |
# Ensure that the LoRA support check passes when the class is not | |
# initialized, but set all these attributes to empty. | |
packed_modules_mapping = {} | |
supported_lora_modules = [] | |
embedding_modules = {} | |
embedding_padding_modules = [] | |
def __new__( | |
cls, | |
vllm_config: VllmConfig, | |
prefix: str = "", | |
) -> None: | |
config = vllm_config.model_config.hf_config | |
# Initialize VL | |
if hasattr(config, "visual"): | |
return ChatGLMV(vllm_config=vllm_config, prefix=prefix) | |
# Initialize LLM | |
else: | |
return ChatGLM(vllm_config=vllm_config, prefix=prefix) |
Lines 1046 to 1051 in d427e5c
if (model_config.is_multimodal_model and not envs.VLLM_USE_V1 | |
and self.enable_prefix_caching): | |
logger.warning("--enable-prefix-caching is currently not " | |
"supported for multimodal models in v0 and " | |
"has been disabled.") | |
self.enable_prefix_caching = False |
Unfortunately, the glm-4 and glm-4v models have the same model_type
value, how can I override this behavior without changing the code?
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working