02 Aug 23:56

WoosukKwon

vLLM v0.1.3

What's Changed

Major changes

More model support: LLaMA 2, Falcon, GPT-J, Baichuan, etc.
Efficient support for MQA and GQA.
Changes in the scheduling algorithm: vLLM now uses a TGI-style continuous batching.
And many bug fixes.

All changes

fix: only response [DONE] once when streaming response. by @gesanqiu in #378
[Fix] Change /generate response-type to json for non-streaming by @nicolasf in #374
Add trust-remote-code flag to handle remote tokenizers by @codethazine in #364
avoid python list copy in sequence initialization by @LiuXiaoxuanPKU in #401
[Fix] Sort LLM outputs by request ID before return by @WoosukKwon in #402
Add trust_remote_code arg to get_config by @WoosukKwon in #405
Don't try to load training_args.bin by @lpfhs in #373
[Model] Add support for GPT-J by @AndreSlavescu in #226
fix: freeze pydantic to v1 by @kemingy in #429
Fix handling of special tokens in decoding. by @xcnick in #418
add vocab padding for LLama(Support WizardLM) by @esmeetu in #411
Fix the KeyError when loading bloom-based models by @HermitSun in #441
Optimize MQA Kernel by @zhuohan123 in #452
Offload port selection to OS by @zhangir-azerbayev in #467
[Doc] Add doc for running vLLM on the cloud by @Michaelvll in #426
[Fix] Fix the condition of max_seq_len by @zhuohan123 in #477
Add support for baichuan by @codethazine in #365
fix max seq len by @LiuXiaoxuanPKU in #489
Fixed old name reference for max_seq_len by @MoeedDar in #498
hotfix attn alibi wo head mapping by @Oliver-ss in #496
fix(ray_utils): ignore re-init error by @mspronesti in #465
Support trust_remote_code in benchmark by @wangruohui in #518
fix: enable trust-remote-code in api server & benchmark. by @gesanqiu in #509
Ray placement group support by @Yard1 in #397
Fix bad assert in initialize_cluster if PG already exists by @Yard1 in #526
Add support for LLaMA-2 by @zhuohan123 in #505
GPTJConfig has no attribute rotary. by @leegohi04517 in #532
[Fix] Fix GPTBigcoder for distributed execution by @zhuohan123 in #503
Fix paged attention testing. by @shanshanpt in #495
fixed tensor parallel is not defined by @MoeedDar in #564
Add Baichuan-7B to README by @zhuohan123 in #494
[Fix] Add chat completion Example and simplify dependencies by @zhuohan123 in #576
[Fix] Add model sequence length into model config by @zhuohan123 in #575
[Fix] fix import error of RayWorker (#604) by @zxdvd in #605
fix ModuleNotFoundError by @mklf in #599
[Doc] Change old max_seq_len to max_model_len in docs by @SiriusNEO in #622
fix biachuan-7b tp by @Sanster in #598
[Model] support baichuan-13b based on baichuan-7b by @Oliver-ss in #643
Fix log message in scheduler by @LiuXiaoxuanPKU in #652
Add Falcon support (new) by @zhuohan123 in #592
[BUG FIX] upgrade fschat version to 0.2.23 by @YHPeter in #650
Refactor scheduler by @WoosukKwon in #658
[Doc] Add Baichuan 13B to supported models by @zhuohan123 in #656
Bump up version to 0.1.3 by @zhuohan123 in #657

New Contributors

@nicolasf made their first contribution in #374
@codethazine made their first contribution in #364
@lpfhs made their first contribution in #373
@AndreSlavescu made their first contribution in #226
@kemingy made their first contribution in #429
@xcnick made their first contribution in #418
@esmeetu made their first contribution in #411
@HermitSun made their first contribution in #441
@zhangir-azerbayev made their first contribution in #467
@MoeedDar made their first contribution in #498
@Oliver-ss made their first contribution in #496
@mspronesti made their first contribution in #465
@wangruohui made their first contribution in #518
@Yard1 made their first contribution in #397
@leegohi04517 made their first contribution in #532
@shanshanpt made their first contribution in #495
@zxdvd made their first contribution in #605
@mklf made their first contribution in #599
@SiriusNEO made their first contribution in #622
@Sanster made their first contribution in #598
@YHPeter made their first contribution in #650

Full Changelog: v0.1.2...v0.1.3

Contributors

zxdvd, nicolasf, and 24 other contributors

Assets 2

05 Jul 04:51

zhuohan123

vLLM v0.1.2

What's Changed

Initial support for GPTBigCode
Support for MPT and BLOOM
Custom tokenizer
ChatCompletion endpoint in OpenAI demo server
Code format
Various bug fixes and improvements
Documentation improvement

Contributors

Thanks to the following amazing people who contributed to this release:

@michaelfeil @WoosukKwon @metacryptom @merrymercy @BasicCoder @zhuohan123 @twaka @comaniac @neubig @JRC1995 @LiuXiaoxuanPKU @bm777 @Michaelvll @gesanqiu @ironpinguin @coolcloudcol @akxxsb

Full Changelog: v0.1.1...v0.1.2

Contributors

ironpinguin, neubig, and 15 other contributors

Assets 2

22 Jun 07:38

zhuohan123

vLLM v0.1.1 (Patch)

What's Changed

Fix Ray node resources error by @zhuohan123 in #193
[Bugfix] Fix a bug in RequestOutput.finished by @WoosukKwon in #202
[Fix] Better error message when there is OOM during cache initialization by @zhuohan123 in #203
Bump up version to 0.1.1 by @zhuohan123 in #204

Full Changelog: v0.1.0...v0.1.1

Contributors

zhuohan123 and WoosukKwon

Assets 2

20 Jun 06:28

WoosukKwon

vLLM v0.1.0

The first official release of vLLM!

See our README for details.

Thanks

Thanks @WoosukKwon @zhuohan123 @suquark for their contributions.

Contributors

suquark, zhuohan123, and WoosukKwon

Assets 2