-
-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Description
Your current environment
工具: Vllm=0.8.4
模型:Qwq32B
配置:
{
"architectures": [
"Qwen2ForCausalLM"
],
"attention_dropout": 0.0,
"bos_token_id": 151643,
"eos_token_id": 151645,
"hidden_act": "silu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 27648,
"max_position_embeddings": 40960,
"max_window_layers": 64,
"model_type": "qwen2",
"num_attention_heads": 40,
"num_hidden_layers": 64,
"num_key_value_heads": 8,
"rms_norm_eps": 1e-05,
"rope_theta": 1000000.0,
"sliding_window": 40960,
"tie_word_embeddings": false,
"torch_dtype": "bfloat16",
"transformers_version": "4.43.1",
"use_cache": true,
"use_sliding_window": true,
"vocab_size": 152064
}
启动命令:
python -m vllm.entrypoints.openai.api_server --model /home/user/Models/QwQ-32B --host "::" --port 8600 --tensor-parallel-size 8 --gpu-memory-utilization 0.95 --max-model-len 40960 --dtype bfloat16 --max-num-seqs 16 --served-model-name qwq32b --swap-space 10 --enable_prefix_caching --enable-chunked-prefill --use-v2-block-manager --enforce-eager --disable-custom-all-reduce --trust-remote-code
错误信息:
(VllmWorker rank=1 pid=3457224) raise ValueError("Sliding window for some but all layers is not "
(VllmWorker rank=1 pid=3457224) ValueError: Sliding window for some but all layers is not supported. This model uses sliding window but max_window_layers = 64 is less than num_hidden_layers = 64. Please open an issue to discuss this feature.
CRITICAL 04-28 19:54:55 [core_client.py:359] Got fatal signal from worker processes, shutting down. See stack trace above for root cause issue.
(VllmWorker rank=1 pid=3457224) Exception ignored in atexit callback: <function shutdown at 0x7227b809c9d0>
(VllmWorker rank=1 pid=3457224) Traceback (most recent call last):
How would you like to use vllm
I want to run inference of a QWQ32B. I don't know how to enable sliding window feature with vllm.
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.