Fully overlap model execution #134

tianmu-li · 2025-09-03T21:22:23Z

tianmu-li · 2025-09-11T03:23:45Z

Half overlapping. There is still one sync point at https://github.com/vllm-project/vllm-gaudi/pull/134/files#diff-5ffdc7547fbc10ff45e9791caaef30c306a59a0e3f7c9515569f342baed8c0e2R116, but I can't find a safe way to remove it.

Signed-off-by: Tianmu Li <[email protected]> Incorporate commit by Marcin Signed-off-by: Tianmu Li <[email protected]> Pre-commit fix Signed-off-by: Tianmu Li <[email protected]> Remove unneeded change Signed-off-by: Tianmu Li <[email protected]> WIP Signed-off-by: Tianmu Li <[email protected]> pre-commit fix Signed-off-by: Tianmu Li <[email protected]>

Signed-off-by: Tianmu Li <[email protected]>

…racy fix WIP Signed-off-by: Tianmu Li <[email protected]>

Signed-off-by: Tianmu Li <[email protected]>

vllm_gaudi/v1/worker/hpu_model_runner.py

Signed-off-by: Tianmu Li <[email protected]>

xuechendi · 2025-09-12T21:21:05Z

/run-gaudi-tests

xuechendi

Had offline discussion with Tianmu, codes looks good.
Perf and profiling will be updated into ticket

xuechendi · 2025-09-12T22:11:40Z

Failed on spec decode when async_scheduler is not on, cancel the run

…oken_ids behavior with non-spec Signed-off-by: Tianmu Li <[email protected]>

…etween spec decode and async Signed-off-by: Tianmu Li <[email protected]>

tianmu-li · 2025-09-13T04:35:56Z

Fixed the issue by preventing concatenation of decode_sampled_token_ids and prefill_sampled_token_ids when not using async_scheduling.

mgawarkiewicz-intel · 2025-09-15T11:50:58Z

/run-gaudi-tests

xuechendi · 2025-09-15T15:48:45Z

/run-gaudi-tests

xuechendi · 2025-09-15T19:01:24Z

/run-gaudi-tests

Dependent on vllm-project/vllm#23569 --------- Signed-off-by: Tianmu Li <[email protected]> Co-authored-by: Chendi.Xue <[email protected]>

Porting changes from main branch: #134 i #184 Author: Tianmu Li <[email protected]> --------- Signed-off-by: Tianmu Li <[email protected]> Co-authored-by: Tianmu Li <[email protected]> Co-authored-by: Chendi.Xue <[email protected]>

Dependent on vllm-project/vllm#23569 --------- Signed-off-by: Tianmu Li <[email protected]> Co-authored-by: Chendi.Xue <[email protected]>

tianmu-li force-pushed the test_async branch 3 times, most recently from 778161a to 68609db Compare September 11, 2025 00:35

tianmu-li changed the title ~~[WIP] Fully overlap model execution~~ Fully overlap model execution Sep 11, 2025

tianmu-li marked this pull request as ready for review September 11, 2025 03:20

tianmu-li requested review from kzawora-intel, xuechendi, mswiniarsk and adobrzyn as code owners September 11, 2025 03:20

tianmu-li force-pushed the test_async branch from c9f8e34 to b2a376d Compare September 11, 2025 03:20

tianmu-li added 7 commits September 11, 2025 21:11

Align with upstream pr

7ec9bd6

Signed-off-by: Tianmu Li <[email protected]>

Resolve issue around dummy sampled tokens; incomplete prompt and accu…

1e7f81e

…racy fix WIP Signed-off-by: Tianmu Li <[email protected]>

Fix accurady issue

119a83a

Signed-off-by: Tianmu Li <[email protected]>

pre-commit fix

14c0abf

Signed-off-by: Tianmu Li <[email protected]>

more pre-commit fix

ab4d28a

Signed-off-by: Tianmu Li <[email protected]>

Add ci

fa3b2c0

Signed-off-by: Tianmu Li <[email protected]>

tianmu-li force-pushed the test_async branch from b2a376d to fa3b2c0 Compare September 11, 2025 18:11

Remove duplicate line after rebase

378c838

Signed-off-by: Tianmu Li <[email protected]>

xuechendi reviewed Sep 11, 2025

View reviewed changes

vllm_gaudi/v1/worker/hpu_model_runner.py Outdated Show resolved Hide resolved

xuechendi reviewed Sep 11, 2025

View reviewed changes

vllm_gaudi/v1/worker/hpu_model_runner.py Show resolved Hide resolved

tianmu-li and others added 8 commits September 11, 2025 23:12

Move spec decode assertion earlier; separate output handling

74afcb8

Signed-off-by: Tianmu Li <[email protected]>

Merge branch 'main' into test_async

f140ec1

Merge remote-tracking branch 'origin/main' into test_async

05e7c6b

pre-commit fix

28bed58

Signed-off-by: Tianmu Li <[email protected]>

WIP

7e6e11f

Signed-off-by: Tianmu Li <[email protected]>

Use streams for the two d2h copies

60adcf9

Signed-off-by: Tianmu Li <[email protected]>

Move sync point later

e94aa5b

Signed-off-by: Tianmu Li <[email protected]>

Merge remote-tracking branch 'origin/main' into test_async

36069e2

tianmu-li and others added 4 commits September 12, 2025 22:25

Fix merge errors

1346166

Signed-off-by: Tianmu Li <[email protected]>

Move sync as late as possible

db877c8

Signed-off-by: Tianmu Li <[email protected]>

Merge branch 'main' into test_async

41749d3

Merge branch 'main' into test_async

b74a9b7

xuechendi approved these changes Sep 12, 2025

View reviewed changes

tianmu-li added 2 commits September 13, 2025 07:22

Fix typo when preparing logits_indices_device; align decode_sampled_t…

74254fb

…oken_ids behavior with non-spec Signed-off-by: Tianmu Li <[email protected]>

Revert spec-decode change. Add guard and note around incomptibility b…

7fa14f6

…etween spec decode and async Signed-off-by: Tianmu Li <[email protected]>

Merge branch 'main' into test_async

1d97026

Merge branch 'main' into test_async

e2f0134

Merge branch 'main' into test_async

a7612d6

xuechendi merged commit dca6719 into vllm-project:main Sep 15, 2025
8 checks passed

kdamaszk pushed a commit to kdamaszk/vllm-gaudi that referenced this pull request Sep 18, 2025

Fully overlap model execution (vllm-project#134)

a84a19c

Dependent on vllm-project/vllm#23569 --------- Signed-off-by: Tianmu Li <[email protected]> Co-authored-by: Chendi.Xue <[email protected]>

kdamaszk mentioned this pull request Sep 18, 2025

Port "Fully overlap model execution" #195

Merged

slokesha pushed a commit to slokesha/vllm-gaudi that referenced this pull request Sep 24, 2025

Fully overlap model execution (vllm-project#134)

e04d6aa

Dependent on vllm-project/vllm#23569 --------- Signed-off-by: Tianmu Li <[email protected]> Co-authored-by: Chendi.Xue <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fully overlap model execution #134

Fully overlap model execution #134

Uh oh!

tianmu-li commented Sep 3, 2025

Uh oh!

tianmu-li commented Sep 11, 2025

Uh oh!

Uh oh!

Uh oh!

xuechendi commented Sep 12, 2025

Uh oh!

xuechendi left a comment

Uh oh!

xuechendi commented Sep 12, 2025

Uh oh!

tianmu-li commented Sep 13, 2025

Uh oh!

mgawarkiewicz-intel commented Sep 15, 2025

Uh oh!

xuechendi commented Sep 15, 2025

Uh oh!

xuechendi commented Sep 15, 2025

Uh oh!

Uh oh!

Uh oh!

Fully overlap model execution #134

Fully overlap model execution #134

Uh oh!

Conversation

tianmu-li commented Sep 3, 2025

Uh oh!

tianmu-li commented Sep 11, 2025

Uh oh!

Uh oh!

Uh oh!

xuechendi commented Sep 12, 2025

Uh oh!

xuechendi left a comment

Choose a reason for hiding this comment

Uh oh!

xuechendi commented Sep 12, 2025

Uh oh!

tianmu-li commented Sep 13, 2025

Uh oh!

mgawarkiewicz-intel commented Sep 15, 2025

Uh oh!

xuechendi commented Sep 15, 2025

Uh oh!

xuechendi commented Sep 15, 2025

Uh oh!

Uh oh!

Uh oh!