v1: Set num_cpu_blocks on VllmConfig #24498

orozery · 2025-09-09T10:04:32Z

Currently in v1 this field (num_cpu_blocks) is always zero.
This PR sets it by dividing the swap_size parameter with the total number of bytes per KV block.

This will be useful for the v1 CPU offloading feature (#19854), as it will allow the user to set the CPU utilization in GB instead of number of blocks.

gemini-code-assist · 2025-09-09T10:12:39Z

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

This commit sets the vllm_config.cache_config.num_cpu_blocks according to vllm_config.cache_config.swap_space. Signed-off-by: Or Ozeri <[email protected]>

ApostaC

Otherwise LGTM.

ApostaC · 2025-09-12T00:31:43Z

vllm/v1/engine/core.py

+            num_cpu_blocks = (int(vllm_config.cache_config.swap_space_bytes) //
+                              kv_cache_configs[0].kv_bytes_per_block)


There is also a knob called offloaded_block_size in #22595. IIUC, it also impacts the calculation of num_cpu_blocks, right? (i.e., if we have larger CPU blocks, we should have less number of CPU blocks)

In v0, the offloading was part of the core.
My suggestion for v1 is to have the offloading as a connector.
I wanted to follow the convention for connectors, where all of their arguments are actually defined in their kv_connector_extra_config.

However, deriving num_cpu_blocks from some kind of a swap_space parameter requires knowledge of kv_bytes_per_block.
So basically, I need my connector (both scheduler-side and worker-side) to be aware of kv_bytes_per_block.
This requires changing things in core, so I tried to make minimal changes and came up with the approach here:

For the scheduler-side connector, report kv_bytes_per_block by setting the existing V0 field num_cpu_blocks.
For the worker-side connector, pass-on kv_cache_configs via the register_kv_caches function (in a follow-up PR).

When the offloading connector gets this num_cpu_blocks (given in GPU block size), it can derive the actual num_cpu_blocks by dividing by block_size_factor.

To sum-up, I'm trying to make minimal changes to the core.
This results in the actual offloading configuration parameters split between vllm_config.cache_config and kv_connector_extra_config.

I'm good with taking a different approach.
Your thoughts?
Perhaps we should ask other relevant folks on their opinion here?

Yeah. This is a good point. I think at a high level, there should be two parameters that can be configured by users: (1) total_cpu_buffer_size and (2) cpu_buffer_block_size (how many tokens in each CPU block).

For (1), it's also worth thinking whether it's per rank or per vLLM instance (i.e., summed across all ranks). I feel like if it's per rank, probably it will be better to pass it in the KV connector configs, while it makes more sense to have a "global" cache size when it's configured by global configurations like --swap-space.

For (2), I think it should definitely be put into the KV connector config as it's the current CPU-offloading-connector-specific configuration.

To sum up, I feel like putting all the configs into the KV connector config will probably be better and less confusing. WDYT?

mergify · 2025-09-16T04:14:03Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @orozery.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

orozery requested review from WoosukKwon, robertgshaw2-redhat, njhill, ywang96, comaniac and alexm-redhat as code owners September 9, 2025 10:04

mergify bot added the v1 label Sep 9, 2025

v1: Set num_cpu_blocks on VllmConfig

0b6d358

This commit sets the vllm_config.cache_config.num_cpu_blocks according to vllm_config.cache_config.swap_space. Signed-off-by: Or Ozeri <[email protected]>

orozery force-pushed the num-cpu-blocks branch from ddf3449 to 0b6d358 Compare September 9, 2025 11:09

ApostaC suggested changes Sep 12, 2025

View reviewed changes

mergify bot added the needs-rebase label Sep 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

v1: Set num_cpu_blocks on VllmConfig #24498

v1: Set num_cpu_blocks on VllmConfig #24498

orozery commented Sep 9, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot commented Sep 9, 2025

Uh oh!

ApostaC left a comment

Uh oh!

ApostaC Sep 12, 2025

Uh oh!

orozery Sep 12, 2025

Uh oh!

ApostaC Sep 16, 2025

Uh oh!

mergify bot commented Sep 16, 2025

Uh oh!

Uh oh!

		num_cpu_blocks = (int(vllm_config.cache_config.swap_space_bytes) //
		kv_cache_configs[0].kv_bytes_per_block)

Uh oh!

v1: Set num_cpu_blocks on VllmConfig #24498

Are you sure you want to change the base?

v1: Set num_cpu_blocks on VllmConfig #24498

Conversation

orozery commented Sep 9, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Sep 9, 2025

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

ApostaC Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

orozery Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

ApostaC Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

mergify bot commented Sep 16, 2025

Uh oh!

Uh oh!

orozery commented Sep 9, 2025 •

edited by github-actions bot

Loading