Skip to content

Conversation

pwschuurman
Copy link
Contributor

@pwschuurman pwschuurman commented Aug 28, 2025

Purpose

This PR fixes Run:AI Model Streamer PIP package (runai-model-streamer) to the latest version (0.14.0).

  • Fix [Bug]: Failed to load model from local s3 instance #23236 by changing ordering of model loading
  • Move S3 dependencies (eg: boto3) into the Run:AI Model Streamer python library to mkae Run:AI's model loading interface more modular
  • Add GCS support through the runai-model-streamer-gcs PIP package

Test Plan

Existing unit tests have been validated, and new unit tests have been tests/runai_model_streamer_test/test_runai_utils.py

pytest tests/runai_model_streamer_test

In addition, model loading has been tested with --load-format=runai_streamer, using models from local storage, S3 and GCS.

Local Storage

vllm serve codegemma/codegemma-2 --load-format=runai_streamer --served-model-name codegemma

S3 Compatible Endpoint

AWS_ACCESS_KEY_ID="..." \
AWS_SECRET_ACCESS_KEY="..." \
RUNAI_STREAMER_S3_ENDPOINT="https://storage.googleapis.com" \
AWS_ENDPOINT_URL=https://storage.googleapis.com \
vllm serve gs://pwschuurman-private-bucket/codegemma/codegemma-2 --load-format=runai_streamer --served-model-name codegemma

GCS Endpoint

RUNAI_STREAMER_GCS_CREDENTIAL_FILE=~/creds.json \
vllm serve gs://pwschuurman-private-bucket/codegemma/codegemma-2 --load-format=runai_streamer --served-model-name codegemma

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

@mergify mergify bot added the ci/build label Aug 28, 2025
@pwschuurman pwschuurman force-pushed the update-runai-integration branch 2 times, most recently from 2d65dca to d86203a Compare September 4, 2025 17:20
@pwschuurman pwschuurman changed the title Update vLLM to use latest version of Run:AI Model Streamer [Bugfix] Update Run:AI Model Streamer Loading Integration Sep 4, 2025
@pwschuurman pwschuurman force-pushed the update-runai-integration branch from d86203a to b7099e7 Compare September 4, 2025 20:48
@pwschuurman pwschuurman marked this pull request as ready for review September 4, 2025 23:29
@pwschuurman pwschuurman force-pushed the update-runai-integration branch from b7099e7 to 712e99e Compare September 5, 2025 17:29
@omer-dayan
Copy link
Contributor

Hey @DarkLight1337 .
I am the maintainer of RunAI Model Streamer,
Worked closely with @pwschuurman and I approve this actually fix the bug,
Tested it with the use cases of the open issues.

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing!

@DarkLight1337 DarkLight1337 enabled auto-merge (squash) September 9, 2025 11:14
@DarkLight1337 DarkLight1337 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 9, 2025
@scandukuri
Copy link

Thanks for your work everyone - any timeline on the above getting merged/released?

@DarkLight1337
Copy link
Member

Retrying the failing tests to see if they are related to this PR

@pwschuurman
Copy link
Contributor Author

Failing Checks:

@vllm-bot vllm-bot merged commit 4377b1a into vllm-project:main Sep 10, 2025
67 of 71 checks passed
@DarkLight1337 DarkLight1337 added this to the v0.10.2 milestone Sep 10, 2025
@lengrongfu
Copy link
Contributor

lengrongfu commented Sep 10, 2025

@pwschuurman I use minio to save model, but canot running, get error info is Could not receive runai_response from libstreamer due to: b'File access error'.

I should how to use.

import os

os.environ['AWS_ACCESS_KEY_ID'] = "yslOIiswW3I4QX9QEiIY"
os.environ['AWS_SECRET_ACCESS_KEY'] = "oVNoExWNrWJd4TUstoYjyCybtbFchPxKGUKGM54H"
os.environ['AWS_ENDPOINT_URL']= "http://xxxx"
os.environ['RUNAI_STREAMER_S3_ENDPOINT']= "http://xxxx"

from  vllm import SamplingParams, LLM, AsyncEngineArgs, AsyncLLMEngine
tests = ["hello what is your name?"]
llm = LLM(model="s3://model/Qwen/Qwen3-0.6B", load_format="runai_streamer",
                       tensor_parallel_size=1, max_model_len=20000)

outputs = llm.generate(tests)

print(outputs)

I can ensure this ak and sk is regiht. i can use this code download config.json file to /tmp/config.json.

import os

import boto3

os.environ['AWS_ACCESS_KEY_ID'] = "yslOIiswW3I4QX9QEiIY"
os.environ['AWS_SECRET_ACCESS_KEY'] = "oVNoExWNrWJd4TUstoYjyCybtbFchPxKGUKGM54H"
os.environ['AWS_ENDPOINT_URL']= "http://xxxxx"


s3 = boto3.client('s3')
s3.download_file("model", os.path.join("Qwen/Qwen3-0.6B/config.json"),
                              "/tmp/config.json")

@lengrongfu
Copy link
Contributor

run-ai/runai-model-streamer#81 I found this project bug, current don't use.

skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025
…ct#23845)

Signed-off-by: Omer Dayan (SW-GPU) <[email protected]>
Signed-off-by: Peter Schuurman <[email protected]>
Co-authored-by: Omer Dayan (SW-GPU) <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
…ct#23845)

Signed-off-by: Omer Dayan (SW-GPU) <[email protected]>
Signed-off-by: Peter Schuurman <[email protected]>
Co-authored-by: Omer Dayan (SW-GPU) <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: Failed to load model from local s3 instance
6 participants