-
Notifications
You must be signed in to change notification settings - Fork 14
Initial AMD Profiling #339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
0dfd175
to
58ce131
Compare
ae1ff0a
to
710e5d6
Compare
Sorry for the delay, I'll have time to review tomorrow afternoon PST EDIT: PR looks reasonable to me. Probably would want a final stamp by @ngc92 before merge But my feedback would be
On size limits gpu mode is max boosted and that'll likely continue in the future |
quickly scrolled over the changes, didn't spot anything that looked problematic. I'd echo Mark's comment, though, that it'd be nice if the PR was a bit smaller :) |
Yeah, sure, that seems like good idea. Should the reference kernels also be updated after this?
Hmm im not exactly sure what this is caused by. I thought that it was unrelated, but it doesn't seem to be a problem with other pull requests...
Its my impression that the multi gpu stuff already in main is all that is needed. Am I missing something? |
It's more of a guess but what I'd try in run_eval.py is making this change since CUDA_FLAGS does get modified in place and maybe some repeated run is causing clunky behavior if flags is None:
flags = CUDA_FLAGS.copy()
|
710e5d6
to
f053ad4
Compare
f053ad4
to
07207e9
Compare
This uses rocPROF to fetch some interesting data and put it in the profile_data directory, the download link of which is then returned to the user.
rocPROF generates one trace for every process. Simply combine them together into a single trace for ease of use. Also remove the individual traces are they are no longer useful afterwards.
07207e9
to
3de6d82
Compare
Description
This is an initial implementation of profiling using rocPROF for MI300 jobs. It works as follows: When the profiling run time is selected, the application is simply executed using rocPROF with options set to gather a bunch of traces. These are then packages with RunResults, and finally sent to the user. This current version is the most straight forward way to implement this, but there are some missing features and limitations:
eval.py
does not yet contain rocTX markers.