Skip to content

Conversation

Silv3S
Copy link
Contributor

@Silv3S Silv3S commented Aug 29, 2025

  • Early return if input to nonzero has no elements,
  • Change pytorch MAX_DIMS to XPU_MAX_TENSORINFO_DIMS xpu equivalent,
  • Rename tensor to out to match op schema.

@Copilot Copilot AI review requested due to automatic review settings August 29, 2025 10:30
Copilot

This comment was marked as outdated.

@Silv3S Silv3S requested review from CuiYifeng and Copilot September 9, 2025 09:40
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes the nonzero operation by moving early validation checks from the kernel implementation to the operator level, improving performance for edge cases. The changes also standardize naming conventions and XPU-specific constants.

  • Move early return for empty tensors from kernel to operator level
  • Replace PyTorch MAX_DIMS with XPU-specific XPU_MAX_TENSORINFO_DIMS constant
  • Rename tensor parameter to out for consistency with operation schema

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
src/ATen/native/xpu/Nonzero.cpp Adds early return for empty tensors and updates dimension limit constant
src/ATen/native/xpu/sycl/NonzeroKernel.cpp Removes redundant empty tensor check and renames variables for schema consistency
Comments suppressed due to low confidence (1)

src/ATen/native/xpu/sycl/NonzeroKernel.cpp:1

  • The range tensor is declared but never used. The range_begin pointer is set to nullptr on line 99, but the range tensor allocation should be removed since it's not needed.
#include <ATen/Dispatch.h>

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@Silv3S
Copy link
Contributor Author

Silv3S commented Sep 16, 2025

This PR already passed CI https://github.com/intel/torch-xpu-ops/actions/runs/17636555116, but then after updating branch, it started to fail due to regression from main branch. Newly added xpu_distributed test test_c10d_xccl.py::CommTest::test_unwaited is const failing. I submitted trivial PR #2044 and confirmed that it fails there too.

Could this test be marked as xfail to unblock CI?

@CuiYifeng
Copy link
Contributor

Please rerun failed unit tests after #2051 is merged.

@Chao1Han
Copy link
Contributor

test_unwaited skiped here #1945

@CuiYifeng CuiYifeng added this pull request to the merge queue Sep 17, 2025
Merged via the queue into intel:main with commit a5f73c1 Sep 17, 2025
25 checks passed
@Silv3S Silv3S deleted the nonzero_xpu branch September 17, 2025 05:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants