Initial AMD Profiling #339

Snektron · 2025-08-25T18:27:37Z

Description

This is an initial implementation of profiling using rocPROF for MI300 jobs. It works as follows: When the profiling run time is selected, the application is simply executed using rocPROF with options set to gather a bunch of traces. These are then packages with RunResults, and finally sent to the user. This current version is the most straight forward way to implement this, but there are some missing features and limitations:

The tracing results ZIP needs to be less than 8 MB. While Discord should support 25 MB for non-boosted servers and 100 MB with boosted servers, this is apparently an API limitation. See Request entity to large Rapptz/discord.py#1733. Test traces appear to be about 5 MB, but I don't know yet about longer programs and multi-GPU programs.
- A github link is now shared
The HSA trace is not collected. It is too large to even be opened by perfetto in a reasonable time, so for now I've left that out. Perhaps later.
- I think its best to leave this out
I've not yet tested how well it works with multi-GPU programs. I'll check by rebasing on top of Multi-GPU #335.
I'd like to add code object dumps to the profile results, but those increase the binary by quite a bit too.
eval.py does not yet contain rocTX markers.
rocPROF prints a bunch of perfetto messages which always appear in the stderr. This looks like an issue on rocPROF's side, logging isn't configured properly. Not sure if we should solve this now...

github-actions · 2025-08-28T22:06:46Z

Coverage report

This PR does not seem to contain any modification to coverable code.

msaroufim · 2025-09-02T05:18:54Z

Sorry for the delay, I'll have time to review tomorrow afternoon PST

EDIT: PR looks reasonable to me. Probably would want a final stamp by @ngc92 before merge

But my feedback would be

Can we have a standalone PR where we update examples/
CI still failing
Rebase on multi gpu branch PR will become shorter

On size limits gpu mode is max boosted and that'll likely continue in the future

ngc92 · 2025-09-02T22:41:53Z

quickly scrolled over the changes, didn't spot anything that looked problematic. I'd echo Mark's comment, though, that it'd be nice if the PR was a bit smaller :)
There's a bunch of changes here that aren't strictly about profiling (the example updates, but also, e.g., reporting cuda vs rocm); if we could get them in separate PRs, those would be trivial to approve and this would be a bit more focused.

Snektron · 2025-09-02T22:55:54Z

Can we have a standalone PR where we update examples/

Yeah, sure, that seems like good idea. Should the reference kernels also be updated after this?

CI still failing

Hmm im not exactly sure what this is caused by. I thought that it was unrelated, but it doesn't seem to be a problem with other pull requests...

Rebase on multi gpu branch PR will become shorter

Its my impression that the multi gpu stuff already in main is all that is needed. Am I missing something?

msaroufim · 2025-09-03T04:30:07Z

So re the CI failure my guess is that nvcc command construction is seeing utils.h

It's more of a guess but what I'd try in run_eval.py is making this change since CUDA_FLAGS does get modified in place and maybe some repeated run is causing clunky behavior

  if flags is None:
      flags = CUDA_FLAGS.copy()

On the multi gpu stuff my bad thought you weren't rebased
I don't think we need to update the reference kernels with a dev suffix but yeah if any name is outright wrong happy to accept a PR fixing it

Snektron · 2025-09-09T21:11:37Z

I've split up the pull request as requested, see #354 and #355. This pull request is now based on #354; that one should be merged first because GH cant into stacked pull requests

This uses rocPROF to fetch some interesting data and put it in the profile_data directory, the download link of which is then returned to the user.

rocPROF generates one trace for every process. Simply combine them together into a single trace for ease of use. Also remove the individual traces are they are no longer useful afterwards.

Snektron force-pushed the amd-profiling branch 7 times, most recently from 0dfd175 to 58ce131 Compare August 28, 2025 22:05

Snektron force-pushed the amd-profiling branch 9 times, most recently from ae1ff0a to 710e5d6 Compare August 31, 2025 16:31

msaroufim self-requested a review September 2, 2025 05:18

msaroufim requested review from ngc92 and S1ro1 September 2, 2025 05:18

This was referenced Sep 9, 2025

fix example submission names #351

Merged

Hide URL preview on kernelbot workflows #352

Merged

Snektron force-pushed the amd-profiling branch from 710e5d6 to f053ad4 Compare September 9, 2025 18:41

This was referenced Sep 9, 2025

examples: profiling #355

Merged

Profiling Infrastructure #354

Merged

Snektron force-pushed the amd-profiling branch from f053ad4 to 07207e9 Compare September 9, 2025 21:10

Snektron added 3 commits September 10, 2025 19:36

rocprof: implement ROCm profiling

931a65c

This uses rocPROF to fetch some interesting data and put it in the profile_data directory, the download link of which is then returned to the user.

rocprof: post-process rocprof results

335ae52

rocPROF generates one trace for every process. Simply combine them together into a single trace for ease of use. Also remove the individual traces are they are no longer useful afterwards.

rocprof: also output code objects

3de6d82

Snektron force-pushed the amd-profiling branch from 07207e9 to 3de6d82 Compare September 10, 2025 17:37

Snektron requested review from msaroufim and removed request for msaroufim September 11, 2025 07:32

msaroufim approved these changes Sep 13, 2025

View reviewed changes

Snektron merged commit 51af552 into main Sep 13, 2025
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Initial AMD Profiling #339

Initial AMD Profiling #339

Uh oh!

Snektron commented Aug 25, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Aug 28, 2025 •

edited

Loading

Uh oh!

msaroufim commented Sep 2, 2025 •

edited

Loading

Uh oh!

ngc92 commented Sep 2, 2025

Uh oh!

Snektron commented Sep 2, 2025

Uh oh!

msaroufim commented Sep 3, 2025 •

edited

Loading

Uh oh!

Snektron commented Sep 9, 2025

Uh oh!

Uh oh!

Uh oh!

Initial AMD Profiling #339

Initial AMD Profiling #339

Uh oh!

Conversation

Snektron commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

github-actions bot commented Aug 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Coverage report

Uh oh!

msaroufim commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngc92 commented Sep 2, 2025

Uh oh!

Snektron commented Sep 2, 2025

Uh oh!

msaroufim commented Sep 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Snektron commented Sep 9, 2025

Uh oh!

Uh oh!

Uh oh!

Snektron commented Aug 25, 2025 •

edited

Loading

github-actions bot commented Aug 28, 2025 •

edited

Loading

msaroufim commented Sep 2, 2025 •

edited

Loading

msaroufim commented Sep 3, 2025 •

edited

Loading