Fix: UnboundLocalError for variable 'dim' about issue (deepspeedai#7449)

weeknan · tjruwase · loadams · qimcis · commit 9988027d4918 · 2025-07-31T17:28:42.000-04:00
## Fix `UnboundLocalError` in `ZeroLinear.backward()` when training only bias parameters, as mentioned in deepspeedai#7435 This PR addresses an issue in the `ZeroLinear.backward()` method, where the local variable `dim` could be referenced before assignment. This happens specifically when: - Only the bias parameters are set to `requires_grad=True`, and - The training setup uses **ZeRO Stage 3**, **AMP**, and **gradient checkpointing**. ### Problem When only the bias requires gradients, the condition for setting `dim = grad_output.dim()` is skipped, but the value of `dim` is still used later in the computation, leading to: ### Fix Move the assignment `dim = grad_output.dim()` to occur unconditionally, so that `dim` is always defined before being used in any branch of the gradient computation logic. ### Impact This makes the backward pass more robust across different training setups. Signed-off-by: weeknan <zhounan0431@163.com> Co-authored-by: Olatunji Ruwase <tjruwase@gmail.com> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Signed-off-by: qimcis <chixie.mcisaac@gmail.com>
diff --git a/deepspeed/runtime/zero/linear.py b/deepspeed/runtime/zero/linear.py
@@ -86,13 +86,13 @@ def backward(ctx, grad_output):
         # improve efficiency. If you want to make your code simpler, you can
         # skip them. Returning gradients for inputs that don't require it is
         # not an error.
+        dim = grad_output.dim()
         if ctx.needs_input_grad[0]:
             #print(f"Computing grad input weight {weight.shape} grad_output {grad_output.shape}")
             grad_input = grad_output.matmul(weight)
             #print(f"Computed grad input {grad_input.shape}")
         if ctx.needs_input_grad[1]:
             #print("Computing grad weight")
-            dim = grad_output.dim()
             if dim > 2:
                 grad_weight = grad_output.reshape(-1,
                                                   grad_output.shape[-1]).t().matmul(input.reshape(-1, input.shape[-1]))