[Bugfix] Update Run:AI Model Streamer Loading Integration #23845

pwschuurman · 2025-08-28T16:20:30Z

Purpose

This PR fixes Run:AI Model Streamer PIP package (runai-model-streamer) to the latest version (0.14.0).

Fix [Bug]: Failed to load model from local s3 instance #23236 by changing ordering of model loading
Move S3 dependencies (eg: boto3) into the Run:AI Model Streamer python library to mkae Run:AI's model loading interface more modular
Add GCS support through the runai-model-streamer-gcs PIP package

Test Plan

Existing unit tests have been validated, and new unit tests have been tests/runai_model_streamer_test/test_runai_utils.py

pytest tests/runai_model_streamer_test

In addition, model loading has been tested with --load-format=runai_streamer, using models from local storage, S3 and GCS.

Local Storage

vllm serve codegemma/codegemma-2 --load-format=runai_streamer --served-model-name codegemma

S3 Compatible Endpoint

AWS_ACCESS_KEY_ID="..." \
AWS_SECRET_ACCESS_KEY="..." \
RUNAI_STREAMER_S3_ENDPOINT="https://storage.googleapis.com" \
AWS_ENDPOINT_URL=https://storage.googleapis.com \
vllm serve gs://pwschuurman-private-bucket/codegemma/codegemma-2 --load-format=runai_streamer --served-model-name codegemma

GCS Endpoint

RUNAI_STREAMER_GCS_CREDENTIAL_FILE=~/creds.json \
vllm serve gs://pwschuurman-private-bucket/codegemma/codegemma-2 --load-format=runai_streamer --served-model-name codegemma

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

github-actions · 2025-08-28T16:20:40Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

Signed-off-by: Omer Dayan (SW-GPU) <[email protected]>

Signed-off-by: Peter Schuurman <[email protected]>

omer-dayan · 2025-09-09T09:05:59Z

Hey @DarkLight1337 .
I am the maintainer of RunAI Model Streamer,
Worked closely with @pwschuurman and I approve this actually fix the bug,
Tested it with the use cases of the open issues.

DarkLight1337

Thanks for fixing!

scandukuri · 2025-09-09T17:51:23Z

Thanks for your work everyone - any timeline on the above getting merged/released?

DarkLight1337 · 2025-09-09T17:55:25Z

Retrying the failing tests to see if they are related to this PR

pwschuurman · 2025-09-10T00:49:27Z

Failing Checks:

buildkite/ci/pr/tensorizer-test failing due to error message change in PR#23928. Will be fixed in PR#24545

lengrongfu · 2025-09-10T07:02:04Z

@pwschuurman I use minio to save model, but canot running, get error info is Could not receive runai_response from libstreamer due to: b'File access error'.

I should how to use.

import os

os.environ['AWS_ACCESS_KEY_ID'] = "yslOIiswW3I4QX9QEiIY"
os.environ['AWS_SECRET_ACCESS_KEY'] = "oVNoExWNrWJd4TUstoYjyCybtbFchPxKGUKGM54H"
os.environ['AWS_ENDPOINT_URL']= "http://xxxx"
os.environ['RUNAI_STREAMER_S3_ENDPOINT']= "http://xxxx"

from  vllm import SamplingParams, LLM, AsyncEngineArgs, AsyncLLMEngine
tests = ["hello what is your name?"]
llm = LLM(model="s3://model/Qwen/Qwen3-0.6B", load_format="runai_streamer",
                       tensor_parallel_size=1, max_model_len=20000)

outputs = llm.generate(tests)

print(outputs)

I can ensure this ak and sk is regiht. i can use this code download config.json file to /tmp/config.json.

import os

import boto3

os.environ['AWS_ACCESS_KEY_ID'] = "yslOIiswW3I4QX9QEiIY"
os.environ['AWS_SECRET_ACCESS_KEY'] = "oVNoExWNrWJd4TUstoYjyCybtbFchPxKGUKGM54H"
os.environ['AWS_ENDPOINT_URL']= "http://xxxxx"


s3 = boto3.client('s3')
s3.download_file("model", os.path.join("Qwen/Qwen3-0.6B/config.json"),
                              "/tmp/config.json")

lengrongfu · 2025-09-10T07:54:38Z

run-ai/runai-model-streamer#81 I found this project bug, current don't use.

…ct#23845) Signed-off-by: Omer Dayan (SW-GPU) <[email protected]> Signed-off-by: Peter Schuurman <[email protected]> Co-authored-by: Omer Dayan (SW-GPU) <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

mergify bot added the ci/build label Aug 28, 2025

pwschuurman force-pushed the update-runai-integration branch 2 times, most recently from 2d65dca to d86203a Compare September 4, 2025 17:20

pwschuurman changed the title ~~Update vLLM to use latest version of Run:AI Model Streamer~~ [Bugfix] Update Run:AI Model Streamer Loading Integration Sep 4, 2025

pwschuurman force-pushed the update-runai-integration branch from d86203a to b7099e7 Compare September 4, 2025 20:48

noa-neria mentioned this pull request Sep 4, 2025

[Bugfix] support from s3 load model #23842

Closed

5 tasks

pwschuurman marked this pull request as ready for review September 4, 2025 23:29

pwschuurman requested review from simon-mo, WoosukKwon, youkaichao, robertgshaw2-redhat, mgoin, tlrmchlsmth, houseroad, hmellor, yewentao256 and ProExpertProg as code owners September 4, 2025 23:29

omer-dayan and others added 2 commits September 5, 2025 10:29

Bugfix - vLLM S3 with Spec config

baa3e38

Signed-off-by: Omer Dayan (SW-GPU) <[email protected]>

Update vLLM to use latest version of Run:AI Model Streamer

712e99e

Signed-off-by: Peter Schuurman <[email protected]>

pwschuurman force-pushed the update-runai-integration branch from b7099e7 to 712e99e Compare September 5, 2025 17:29

DarkLight1337 approved these changes Sep 9, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) September 9, 2025 11:14

DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 9, 2025

Merge branch 'main' into update-runai-integration

b01b228

DarkLight1337 requested a review from 22quinn as a code owner September 9, 2025 14:29

vllm-bot merged commit 4377b1a into vllm-project:main Sep 10, 2025
67 of 71 checks passed

This was referenced Sep 10, 2025

[Bug]: vLLM (AsyncLLMEngine, LLM) engine initialization fails when using runai_streamer #22843

Open

[Bugfix] when use s3 model cannot use default load_format #24435

Merged

DarkLight1337 added this to the v0.10.2 milestone Sep 10, 2025

lengrongfu mentioned this pull request Sep 10, 2025

[Bug]: Failed to load model from local s3 instance #23236

Closed

1 task

pwschuurman mentioned this pull request Sep 15, 2025

Runai version update #24909

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Bugfix] Update Run:AI Model Streamer Loading Integration #23845

[Bugfix] Update Run:AI Model Streamer Loading Integration #23845

Uh oh!

pwschuurman commented Aug 28, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Aug 28, 2025

Uh oh!

omer-dayan commented Sep 9, 2025

Uh oh!

DarkLight1337 left a comment

Uh oh!

scandukuri commented Sep 9, 2025

Uh oh!

DarkLight1337 commented Sep 9, 2025

Uh oh!

pwschuurman commented Sep 10, 2025

Uh oh!

Uh oh!

lengrongfu commented Sep 10, 2025 •

edited

Loading

Uh oh!

lengrongfu commented Sep 10, 2025

Uh oh!

Uh oh!

Uh oh!

[Bugfix] Update Run:AI Model Streamer Loading Integration #23845

[Bugfix] Update Run:AI Model Streamer Loading Integration #23845

Uh oh!

Conversation

pwschuurman commented Aug 28, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

github-actions bot commented Aug 28, 2025

Uh oh!

omer-dayan commented Sep 9, 2025

Uh oh!

DarkLight1337 left a comment

Choose a reason for hiding this comment

Uh oh!

scandukuri commented Sep 9, 2025

Uh oh!

DarkLight1337 commented Sep 9, 2025

Uh oh!

pwschuurman commented Sep 10, 2025

Uh oh!

Uh oh!

lengrongfu commented Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lengrongfu commented Sep 10, 2025

Uh oh!

Uh oh!

pwschuurman commented Aug 28, 2025 •

edited by github-actions bot

Loading

lengrongfu commented Sep 10, 2025 •

edited

Loading