[Usage]:  vllm infer QWQ32B can‘t enable sliding window

### Your current environment

工具： Vllm=0.8.4
模型：Qwq32B
配置：
{
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 27648,
"max_position_embeddings": 40960,
"max_window_layers": 64,
"model_type": "qwen2",
"num_attention_heads": 40,
"num_hidden_layers": 64,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 1000000.0,
"sliding_window": 40960,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.1",
"use_cache": true,
"use_sliding_window": true,
"vocab_size": 152064
}

启动命令：
python -m vllm.entrypoints.openai.api_server --model /home/user/Models/QwQ-32B --host "::" --port 8600 --tensor-parallel-size 8 --gpu-memory-utilization 0.95 --max-model-len 40960 --dtype bfloat16 --max-num-seqs 16 --served-model-name qwq32b --swap-space 10 --enable_prefix_caching --enable-chunked-prefill --use-v2-block-manager --enforce-eager --disable-custom-all-reduce --trust-remote-code

错误信息：

(VllmWorker rank=1 pid=3457224) raise ValueError("Sliding window for some but all layers is not "
(VllmWorker rank=1 pid=3457224) ValueError: Sliding window for some but all layers is not supported. This model uses sliding window but max_window_layers = 64 is less than num_hidden_layers = 64. Please open an issue to discuss this feature.
CRITICAL 04-28 19:54:55 [core_client.py:359] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.
(VllmWorker rank=1 pid=3457224) Exception ignored in atexit callback: <function shutdown at 0x7227b809c9d0>
(VllmWorker rank=1 pid=3457224) Traceback (most recent call last):

### How would you like to use vllm

I want to run inference of a [QWQ32B](https://huggingface.co/Qwen/QwQ-32B). I don't know how to enable sliding window feature with vllm.


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Usage]: vllm infer QWQ32B can‘t enable sliding window #17306

Your current environment

How would you like to use vllm

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Usage]: vllm infer QWQ32B can‘t enable sliding window #17306

Description

Your current environment

How would you like to use vllm

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions