Benchmarking v2 GH workflows #40716

ahadnagy · 2025-09-05T11:56:30Z

What does this PR do?

This PR is a follow-up to #40486 that adds the workflows and HF Datasets upload to start collecting data.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2025-09-05T12:07:07Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ahadnagy · 2025-09-08T09:40:17Z

The workflow has been tested, and has a complete run here:
https://github.com/huggingface/transformers/actions/runs/17528088273

Rocketknight1 · 2025-09-08T11:37:07Z

cc @McPatate

McPatate · 2025-09-09T12:15:34Z

.github/workflows/benchmark_v2.yml

+        description: 'Model ID to benchmark (e.g., meta-llama/Llama-2-7b-hf)'
+        required: false
+        type: string
+        default: ''


Should we put a default for the model_id?

If there's no input provided here, it runs all the registered model, so an empty input is fine here. It's just for filtering purposes in case we need it for development.

McPatate · 2025-09-09T12:16:03Z

.github/workflows/benchmark_v2.yml

+        description: 'HuggingFace Dataset to upload results to (e.g., "org/benchmark-results")'
+        required: false
+        type: string
+        default: ''


No default here as well?

Currently all the vendors will have their own repos, so we can't really provide a sensible default here. (also wouldn't make sense to make it required in case push is disabled)

@ArthurZucker do you have an input on this?

McPatate · 2025-09-09T12:17:08Z

.github/workflows/benchmark_v2.yml

+        required: false
+        type: string
+        default: ''
+      benchmark_repo_id:


Isn't this redundant with upload_to_hub?

Actually, the other has the wrong description, it's a boolean to toggle the upload on/off. Fixing it.

McPatate · 2025-09-09T12:18:31Z

.github/workflows/benchmark_v2.yml

+          if [ -n "${{ inputs.model_id }}" ]; then
+            args="$args --model-id '${{ inputs.model_id }}'"
+          fi


What would this default to if not specified?

It'll run all the registered benchmarks.

McPatate · 2025-09-09T12:42:45Z

.github/workflows/benchmark_v2_a10_caller.yml

+  push:
+    branches:
+      - run-benchmarking-gh-actions*


wouldn't using labels be easier?

I just mirrored here what the test pipelines offer on this front.

I think you can remove the branch trigger and use the same label as v1 bench:

on: push: branches: [main] pull_request: types: [ opened, labeled, reopened, synchronize ] jobs: benchmark: name: Benchmark if: | (github.event_name == 'pull_request' && contains( github.event.pull_request.labels.*.name, 'run-benchmark') )|| (github.event_name == 'push' && github.ref == 'refs/heads/main') steps: # ...

cc @ArthurZucker wdyt?

McPatate · 2025-09-09T12:44:10Z

benchmark_v2/README.md

+python run_benchmarks.py --upload-to-hf username/benchmark-results
+
+# Upload with a custom run ID for easy identification
+python run_benchmarks.py --upload-to-hf username/benchmark-results --run-id experiment_v1


Do we generate and print the run_id if not provided by the user?

run_id in that case becomes the benchmark_id generated earlier.

McPatate · 2025-09-09T12:45:41Z

benchmark_v2/run_benchmarks.py

+        github_run_number = os.getenv("GITHUB_RUN_NUMBER")
+        github_run_id = os.getenv("GITHUB_RUN_ID")
+        if github_run_number and github_run_id:
+            run_id = f"{github_run_number}-{github_run_id}"


Suggested change

github_run_number = os.getenv("GITHUB_RUN_NUMBER")

github_run_id = os.getenv("GITHUB_RUN_ID")

if github_run_number and github_run_id:

run_id = f"{github_run_number}-{github_run_id}"

github_run_number = os.getenv("GITHUB_RUN_NUMBER")

github_run_id = os.getenv("GITHUB_RUN_ID")

if github_run_number and github_run_id:

run_id = f"{github_run_number}-{github_run_id}"

else:

run_id = uuid.uuid4()

wdyt?

or str(uuid.uuid4())[:8] as you used below

The else case is handled here with the benchmark ID:
https://github.com/huggingface/transformers/pull/40716/files#diff-94fe5532c68b73efc8544ea11c0fa4d90a1d3e9266a8f8ccee2389752cea3d7cR475-R482

ArthurZucker

Happy to merge but usure about configuring stuff in the workflow, it looks pretty cumbersome, not really seeing the up here!

ArthurZucker · 2025-09-10T15:45:39Z

.github/workflows/benchmark_v2_mi325_caller.yml

+      runner: amd-mi325-ci-1gpu
+      warmup_iterations: ${{ inputs.warmup_iterations || 3 }}
+      measurement_iterations: ${{ inputs.measurement_iterations || 5 }}
+      num_tokens_to_generate: ${{ inputs.num_tokens_to_generate || 100 }}
+      commit_sha: ${{ github.sha }}
+      upload_to_hub: true
+      run_id: ${{ github.run_id }}
+      benchmark_repo_id: optimum-amd/transformers-daily-benchmarks


not 100% sure why we have to configure stuff via the workflow, when it seems to always use default

# Add iterations args="$args --warmup-iterations ${{ inputs.warmup_iterations }}" args="$args --measurement-iterations ${{ inputs.measurement_iterations }}" args="$args --num-tokens-to-generate ${{ inputs.num_tokens_to_generate }}" # Add commit ID if available if [ -n "${{ inputs.commit_sha }}" ]; then args="$args --commit-id '${{ inputs.commit_sha }}'" elif [ -n "${{ github.sha }}" ]; then args="$args --commit-id '${{ github.sha }}'" fi # Add HuggingFace upload parameters if specified if [ -n "${{ inputs.upload_to_hub }}" ]; then args="$args --upload-to-hub '${{ inputs.upload_to_hub }}'" fi if [ -n "${{ inputs.run_id }}" ]; then args="$args --run-id '${{ inputs.run_id }}'" fi

seems like a wast but I am probably missing certain usage?

I removed it from the top-level, we don't really need it there. The idea is that this way we can easily implement an "important models " benchmark pipeline that runs on the subset of models, and possibly more often. Now, this might happen, might not later. I'm happy to remove the option if it looks cleaner that way.

yep let's remove for now 🤗

Alright, removed them.

ArthurZucker

lgtm otherwise, will let @McPatate handle the finish!

ArthurZucker · 2025-09-15T09:21:27Z

.github/workflows/benchmark_v2.yml

+      - name: Install benchmark dependencies
+        run: |
+          python3 -m pip install -r benchmark_v2/requirements.txt


we would like to have a docker with all of these IMO! only "re install" the latest updates

ArthurZucker · 2025-09-15T09:21:44Z

.github/workflows/benchmark_v2_a10_caller.yml

+  push:
+    branches:
+      - run-benchmarking-gh-actions*


ahadnagy marked this pull request as draft September 5, 2025 11:56

ahadnagy force-pushed the benchmarking-gh-actions branch 2 times, most recently from 594ce6d to 93d2a90 Compare September 5, 2025 12:18

ahadnagy added 9 commits September 5, 2025 13:51

WIP benchmark v2 workflow

37be20f

Container was missing

d0231bf

Change to sandbox branch name

09f1dc5

Wrong place for image name

e1a6229

Variable declarations

03b3609

Remove references to file logging

52393e3

Remove unnecessary step

78ff33d

Fix deps install

57e3cda

Syntax

1deb38e

ahadnagy force-pushed the benchmarking-gh-actions branch from 0b88e75 to 1deb38e Compare September 5, 2025 13:52

ahadnagy added 4 commits September 5, 2025 14:08

Add workdir

059d740

Add upload feature

e6c45b6

typo

a6a2924

No need for hf_transfer

e72be0e

ahadnagy force-pushed the benchmarking-gh-actions branch 6 times, most recently from 4857630 to a985884 Compare September 6, 2025 18:54

Pass in runner

fdc4301

ahadnagy force-pushed the benchmarking-gh-actions branch from a985884 to fdc4301 Compare September 6, 2025 18:58

ahadnagy added 4 commits September 6, 2025 18:59

Runner config

5bcba35

Runner config

6bb52ed

Runner config

4bf4a81

Runner config

02bf83a

ahadnagy added 5 commits September 6, 2025 19:09

Runner config

8fb8463

mi325 caller

904cab0

Name workflow runs properly

0834e28

Copy-paste error

3416289

Add final repo IDs and schedule

6ef3209

ahadnagy changed the title ~~[WIP] Benchmarking v2 GH workflows~~ Benchmarking v2 GH workflows Sep 8, 2025

ahadnagy requested a review from ivarflakstad September 8, 2025 09:13

ahadnagy marked this pull request as ready for review September 8, 2025 09:13

McPatate reviewed Sep 9, 2025

View reviewed changes

Review comments

16bee68

ahadnagy requested a review from McPatate September 9, 2025 18:05

ArthurZucker reviewed Sep 10, 2025

View reviewed changes

ahadnagy added 4 commits September 10, 2025 16:22

Remove wf params

f5151a4

Ruff

7df4f45

Remove parametrization from worfkflow files

f0701d7

Fix callers

738e07e

ArthurZucker reviewed Sep 15, 2025

View reviewed changes

ahadnagy added 3 commits September 15, 2025 09:56

Change push trigger to pull_request + label

00a4e1f

Add back schedule event

b8b7f5f

Push to the same dataset

e2b5c4e

Benchmarking v2 GH workflows #40716

Are you sure you want to change the base?

Benchmarking v2 GH workflows #40716

Uh oh!

Conversation

ahadnagy commented Sep 5, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Sep 5, 2025

Uh oh!

ahadnagy commented Sep 8, 2025

Uh oh!

Rocketknight1 commented Sep 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahadnagy Sep 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthurZucker left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ahadnagy Sep 9, 2025 •

edited

Loading