Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
37be20f
WIP benchmark v2 workflow
ahadnagy Sep 5, 2025
d0231bf
Container was missing
ahadnagy Sep 5, 2025
09f1dc5
Change to sandbox branch name
ahadnagy Sep 5, 2025
e1a6229
Wrong place for image name
ahadnagy Sep 5, 2025
03b3609
Variable declarations
ahadnagy Sep 5, 2025
52393e3
Remove references to file logging
ahadnagy Sep 5, 2025
78ff33d
Remove unnecessary step
ahadnagy Sep 5, 2025
57e3cda
Fix deps install
ahadnagy Sep 5, 2025
1deb38e
Syntax
ahadnagy Sep 5, 2025
059d740
Add workdir
ahadnagy Sep 5, 2025
e6c45b6
Add upload feature
ahadnagy Sep 6, 2025
a6a2924
typo
ahadnagy Sep 6, 2025
e72be0e
No need for hf_transfer
ahadnagy Sep 6, 2025
fdc4301
Pass in runner
ahadnagy Sep 6, 2025
5bcba35
Runner config
ahadnagy Sep 6, 2025
6bb52ed
Runner config
ahadnagy Sep 6, 2025
4bf4a81
Runner config
ahadnagy Sep 6, 2025
02bf83a
Runner config
ahadnagy Sep 6, 2025
8fb8463
Runner config
ahadnagy Sep 6, 2025
904cab0
mi325 caller
ahadnagy Sep 7, 2025
0834e28
Name workflow runs properly
ahadnagy Sep 7, 2025
3416289
Copy-paste error
ahadnagy Sep 7, 2025
6ef3209
Add final repo IDs and schedule
ahadnagy Sep 8, 2025
16bee68
Review comments
ahadnagy Sep 9, 2025
f5151a4
Remove wf params
ahadnagy Sep 10, 2025
7df4f45
Ruff
ahadnagy Sep 10, 2025
f0701d7
Remove parametrization from worfkflow files
ahadnagy Sep 11, 2025
738e07e
Fix callers
ahadnagy Sep 11, 2025
00a4e1f
Change push trigger to pull_request + label
ahadnagy Sep 15, 2025
b8b7f5f
Add back schedule event
ahadnagy Sep 15, 2025
e2b5c4e
Push to the same dataset
ahadnagy Sep 16, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
82 changes: 82 additions & 0 deletions .github/workflows/benchmark_v2.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
name: Benchmark v2 Framework

on:
workflow_call:
inputs:
runner:
description: 'GH Actions runner group to use'
required: true
type: string
commit_sha:
description: 'Commit SHA to benchmark'
required: false
type: string
default: ''
upload_to_hub:
description: 'Enable/disable uploading results to a HuggingFace Dataset'
required: false
type: string
default: 'false'
run_id:
description: 'Custom run ID for organizing results (auto-generated if not provided)'
required: false
type: string
default: ''
benchmark_repo_id:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this redundant with upload_to_hub?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the other has the wrong description, it's a boolean to toggle the upload on/off. Fixing it.

description: 'HuggingFace Dataset to upload results to (e.g., "org/benchmark-results")'
required: false
type: string
default: ''

env:
HF_HOME: /mnt/cache
TRANSFORMERS_IS_CI: yes
# For gated repositories, we still need to agree to share information on the Hub repo. page in order to get access.
# This token is created under the bot `hf-transformers-bot`.
HF_HUB_READ_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}

jobs:
benchmark-v2:
name: Benchmark v2
runs-on: ${{ inputs.runner }}
if: |
(github.event_name == 'pull_request' && contains( github.event.pull_request.labels.*.name, 'run-benchmark')) ||
(github.event_name == 'schedule')
container:
image: huggingface/transformers-pytorch-gpu
options: --gpus all --privileged --ipc host --shm-size "16gb"
steps:
- name: Get repo
uses: actions/checkout@v4
with:
ref: ${{ inputs.commit_sha || github.sha }}

- name: Install benchmark dependencies
run: |
python3 -m pip install -r benchmark_v2/requirements.txt
Comment on lines +54 to +56
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would like to have a docker with all of these IMO! only "re install" the latest updates


- name: Reinstall transformers in edit mode
run: |
python3 -m pip uninstall -y transformers
python3 -m pip install -e ".[torch]"

- name: Show installed libraries and their versions
run: |
python3 -m pip list
python3 -c "import torch; print(f'PyTorch version: {torch.__version__}')"
python3 -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
python3 -c "import torch; print(f'CUDA device count: {torch.cuda.device_count()}')" || true
nvidia-smi || true

- name: Run benchmark v2
working-directory: benchmark_v2
run: |
echo "Running benchmarks"
python3 run_benchmarks.py \
--commit-id '${{ inputs.commit_sha || github.sha }}' \
--upload-to-hub '${{ inputs.upload_to_hub || false}}' \
--run-id '${{ inputs.run_id }}' \
--benchmark-repo-id '${{ inputs.benchmark_repo_id}}' \
--log-level INFO
env:
HF_TOKEN: ${{ secrets.HF_HUB_READ_TOKEN }}
20 changes: 20 additions & 0 deletions .github/workflows/benchmark_v2_a10_caller.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: Benchmark v2 Scheduled Runner - A10 Single-GPU

on:
schedule:
# Run daily at 16:30 UTC
- cron: "30 16 * * *"
pull_request:
types: [ opened, labeled, reopened, synchronize ]

jobs:
benchmark-v2-default:
name: Benchmark v2 - Default Models
uses: ./.github/workflows/benchmark_v2.yml
with:
runner: aws-g5-4xlarge-cache-use1-public-80
commit_sha: ${{ github.sha }}
upload_to_hub: true
run_id: ${{ github.run_id }}
benchmark_repo_id: hf-internal-testing/transformers-daily-benchmarks
secrets: inherit
20 changes: 20 additions & 0 deletions .github/workflows/benchmark_v2_mi325_caller.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
name: Benchmark v2 Scheduled Runner - MI325 Single-GPU

on:
schedule:
# Run daily at 16:30 UTC
- cron: "30 16 * * *"
pull_request:
types: [ opened, labeled, reopened, synchronize ]

jobs:
benchmark-v2-default:
name: Benchmark v2 - Default Models
uses: ./.github/workflows/benchmark_v2.yml
with:
runner: amd-mi325-ci-1gpu
commit_sha: ${{ github.sha }}
upload_to_hub: true
run_id: ${{ github.run_id }}
benchmark_repo_id: hf-internal-testing/transformers-daily-benchmarks
secrets: inherit
30 changes: 30 additions & 0 deletions benchmark_v2/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,36 @@ python run_benchmarks.py \
--num-tokens-to-generate 200
```

### Uploading Results to HuggingFace Dataset

You can automatically upload benchmark results to a HuggingFace Dataset for tracking and analysis:

```bash
# Upload to a public dataset with auto-generated run ID
python run_benchmarks.py --upload-to-hf username/benchmark-results

# Upload with a custom run ID for easy identification
python run_benchmarks.py --upload-to-hf username/benchmark-results --run-id experiment_v1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we generate and print the run_id if not provided by the user?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run_id in that case becomes the benchmark_id generated earlier.

```

**Dataset Directory Structure:**
```
dataset_name/
├── 2025-01-15/
│ ├── runs/ # Non-scheduled runs (manual, PR, etc.)
│ │ └── 123-1245151651/ # GitHub run number and ID
│ │ └── benchmark_results/
│ │ ├── benchmark_summary_20250115_143022.json
│ │ └── model-name/
│ │ └── model-name_benchmark_20250115_143022.json
│ └── benchmark_results_abc123de/ # Scheduled runs (daily CI)
│ ├── benchmark_summary_20250115_143022.json
│ └── model-name/
│ └── model-name_benchmark_20250115_143022.json
└── 2025-01-16/
└── ...
```

### Running Specific Benchmarks

```bash
Expand Down
1 change: 0 additions & 1 deletion benchmark_v2/benches/llama.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@
from benchmark_framework import ModelBenchmark


os.environ["HF_HUB_ENABLE_HF_TRANSFER"] = "1"
os.environ["TOKENIZERS_PARALLELISM"] = "1"
torch.set_float32_matmul_precision("high")

Expand Down
3 changes: 2 additions & 1 deletion benchmark_v2/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@ psutil>=5.8.0
gpustat>=1.0.0
torch>=2.0.0
transformers>=4.30.0
datasets>=2.10.0
datasets>=2.10.0
huggingface_hub>=0.16.0
Loading