Skip to content

[Usage]: vllm infer QWQ32B can‘t enable sliding window #17306

@SmallBlueE

Description

@SmallBlueE

Your current environment

工具: Vllm=0.8.4
模型:Qwq32B
配置:
{
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 27648,
"max_position_embeddings": 40960,
"max_window_layers": 64,
"model_type": "qwen2",
"num_attention_heads": 40,
"num_hidden_layers": 64,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 1000000.0,
"sliding_window": 40960,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.1",
"use_cache": true,
"use_sliding_window": true,
"vocab_size": 152064
}

启动命令:
python -m vllm.entrypoints.openai.api_server --model /home/user/Models/QwQ-32B --host "::" --port 8600 --tensor-parallel-size 8 --gpu-memory-utilization 0.95 --max-model-len 40960 --dtype bfloat16 --max-num-seqs 16 --served-model-name qwq32b --swap-space 10 --enable_prefix_caching --enable-chunked-prefill --use-v2-block-manager --enforce-eager --disable-custom-all-reduce --trust-remote-code

错误信息:

(VllmWorker rank=1 pid=3457224) raise ValueError("Sliding window for some but all layers is not "
(VllmWorker rank=1 pid=3457224) ValueError: Sliding window for some but all layers is not supported. This model uses sliding window but max_window_layers = 64 is less than num_hidden_layers = 64. Please open an issue to discuss this feature.
CRITICAL 04-28 19:54:55 [core_client.py:359] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.
(VllmWorker rank=1 pid=3457224) Exception ignored in atexit callback: <function shutdown at 0x7227b809c9d0>
(VllmWorker rank=1 pid=3457224) Traceback (most recent call last):

How would you like to use vllm

I want to run inference of a QWQ32B. I don't know how to enable sliding window feature with vllm.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    usageHow to use vllm

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions