[Core] Split LoRA layers #24574

jeejeelee · 2025-09-10T10:33:08Z

Purpose

Split LoRA layers into separate files to facilitate maintenance and improve readability.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: Jee Jee Li <[email protected]>

gemini-code-assist

Code Review

This pull request refactors the LoRA layers by splitting a large layers.py file into a more organized package structure. This is a great improvement for maintainability. I've found a critical issue that could break support for Marlin quantized models, and a high-severity typo in a new module's name that should be corrected.

gemini-code-assist · 2025-09-10T10:34:31Z

vllm/lora/layers/utils.py

+def _get_lora_device(base_layer: nn.Module) -> torch.device:
+    # code borrowed from https://github.com/fmmoret/vllm/blob/fm-support-lora-on-quantized-models/vllm/lora/layers.py#L34
+    """Returns the device for where to place the LoRA tensors."""
+    # unquantizedLinear
+    if hasattr(base_layer, "weight"):
+        return base_layer.weight.device
+    # Compressed Tensor
+    elif hasattr(base_layer, "weight_packed"):
+        return base_layer.weight_packed.device
+    # GPTQ/AWQ
+    elif hasattr(base_layer, "qweight"):
+        return base_layer.qweight.device
+    # HQQ marlin
+    elif hasattr(base_layer, "W_q"):
+        return base_layer.W_q.device
+    else:
+        raise ValueError(f"Unsupported base layer: {base_layer}")


The _get_lora_device function is missing a check for the B attribute, which is used by Marlin quantized models. This can cause a ValueError when using LoRA with Marlin quantization, as the function would fail to determine the correct device. The weight property in BaseLinearLayerWithLoRA correctly handles this case, and this function should be consistent with it to avoid a regression.

Suggested change

def _get_lora_device(base_layer: nn.Module) -> torch.device:

# code borrowed from https://github.com/fmmoret/vllm/blob/fm-support-lora-on-quantized-models/vllm/lora/layers.py#L34

"""Returns the device for where to place the LoRA tensors."""

# unquantizedLinear

if hasattr(base_layer, "weight"):

return base_layer.weight.device

# Compressed Tensor

elif hasattr(base_layer, "weight_packed"):

return base_layer.weight_packed.device

# GPTQ/AWQ

elif hasattr(base_layer, "qweight"):

return base_layer.qweight.device

# HQQ marlin

elif hasattr(base_layer, "W_q"):

return base_layer.W_q.device

else:

raise ValueError(f"Unsupported base layer: {base_layer}")

def _get_lora_device(base_layer: nn.Module) -> torch.device:

# code borrowed from https://github.com/fmmoret/vllm/blob/fm-support-lora-on-quantized-models/vllm/lora/layers.py#L34

"""Returns the device for where to place the LoRA tensors."""

# unquantizedLinear

if hasattr(base_layer, "weight"):

return base_layer.weight.device

# Compressed Tensor

elif hasattr(base_layer, "weight_packed"):

return base_layer.weight_packed.device

# GPTQ/AWQ

elif hasattr(base_layer, "qweight"):

return base_layer.qweight.device

# marlin

elif hasattr(base_layer, "B"):

return base_layer.B.device

# HQQ marlin

elif hasattr(base_layer, "W_q"):

return base_layer.W_q.device

else:

raise ValueError(f"Unsupported base layer: {base_layer}")

gemini-code-assist · 2025-09-10T10:34:31Z

vllm/lora/layers/__init__.py

+from vllm.lora.layers.vocal_parallel_embedding import (
+    VocabParallelEmbeddingWithLoRA,
+)


There appears to be a typo in the module name vocal_parallel_embedding. It should likely be vocab_parallel_embedding to be consistent with the class name VocabParallelEmbeddingWithLoRA and the general terminology. This should be corrected in the filename (vllm/lora/layers/vocal_parallel_embedding.py) and here in the import statement for better code clarity and maintainability.

Suggested change

from vllm.lora.layers.vocal_parallel_embedding import (

VocabParallelEmbeddingWithLoRA,

)

from vllm.lora.layers.vocab_parallel_embedding import (

VocabParallelEmbeddingWithLoRA,

)

Isotr0py · 2025-09-10T10:56:57Z

vllm/lora/layers/fully_sharded_layers.py

Should we also break this file's implementations into column_parallel_linear.py and row_parallel_linear.py etc to keep them in same file with parent class?

Makes sense

Signed-off-by: Jee Jee Li <[email protected]>

Isotr0py

Otherwise LGTM!

Signed-off-by: Jee Jee Li <[email protected]>

DOne

c34dd74

Signed-off-by: Jee Jee Li <[email protected]>

jeejeelee requested a review from DarkLight1337 September 10, 2025 10:33

gemini-code-assist bot reviewed Sep 10, 2025

View reviewed changes

jeejeelee requested a review from Isotr0py September 10, 2025 10:35

Isotr0py reviewed Sep 10, 2025

View reviewed changes

Changes full sharded lora

52f3e3a

Signed-off-by: Jee Jee Li <[email protected]>

jeejeelee force-pushed the split-lora-layer branch from 1792464 to 52f3e3a Compare September 10, 2025 11:40

Merge branch 'main' into split-lora-layer

864ff0e

Isotr0py approved these changes Sep 10, 2025

View reviewed changes

Fix FMT

b1efcd5

Signed-off-by: Jee Jee Li <[email protected]>

jeejeelee added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 10, 2025

vllm-bot merged commit bb3eb80 into vllm-project:main Sep 10, 2025
48 of 50 checks passed

jeejeelee deleted the split-lora-layer branch September 10, 2025 14:51

skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025

[Core] Split LoRA layers (vllm-project#24574)

74a072d

Signed-off-by: Jee Jee Li <[email protected]>

FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025

[Core] Split LoRA layers (vllm-project#24574)

9ecc071

Signed-off-by: Jee Jee Li <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Core] Split LoRA layers #24574

[Core] Split LoRA layers #24574

Uh oh!

jeejeelee commented Sep 10, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 10, 2025

Uh oh!

gemini-code-assist bot Sep 10, 2025

Uh oh!

Isotr0py Sep 10, 2025 •

edited

Loading

Uh oh!

jeejeelee Sep 10, 2025

Uh oh!

Isotr0py left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[Core] Split LoRA layers #24574

[Core] Split LoRA layers #24574

Uh oh!

Conversation

jeejeelee commented Sep 10, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

Isotr0py Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jeejeelee Sep 10, 2025

Choose a reason for hiding this comment

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jeejeelee commented Sep 10, 2025 •

edited by github-actions bot

Loading

Isotr0py Sep 10, 2025 •

edited

Loading