Avoid graph break by removing another redundant requires grad false (deepspeedai#7263)

deepcharm · loadams · hwchen2017 · deepcharm · commit 762a2a1ee50b · 2025-06-16T16:29:13.000+03:00
This PR is an follow-up to [PR deepspeedai#7158](deepspeedai#7158) handling the same issue in another place. See [PR deepspeedai#7158](deepspeedai#7158) for details. --------- Signed-off-by: Max Kovalenko <mkovalenko@habana.ai> Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> Co-authored-by: Hongwei Chen <33092912+hwchen2017@users.noreply.github.com> Signed-off-by: Max Kovalenko <mkovalenko@habana.ai>
diff --git a/deepspeed/runtime/zero/partition_parameters.py b/deepspeed/runtime/zero/partition_parameters.py
@@ -1921,7 +1921,6 @@ def _allgather_params(self, param_list, hierarchy=0):
             flat_scale_tensor = torch.empty(scale_tensor_size,
                                             dtype=param_list[0].ds_tensor.ds_quant_scale.dtype,
                                             device=self.local_device)
-            flat_scale_tensor.requires_grad = False
             scale_partitions = []
             for i in range(self.world_size):
                 start = scale_tensor_size * i