Skip to content

Conversation

mkannwischer
Copy link
Contributor

@mkannwischer mkannwischer commented Aug 4, 2025

@mkannwischer mkannwischer force-pushed the decompose-asm branch 2 times, most recently from 76fe175 to 0f865ed Compare August 4, 2025 07:19
Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (opt)

Benchmark suite Current: 52400e8 Previous: 6a9e7f3 Ratio
ML-DSA-44 keypair 50481 cycles 50494 cycles 1.00
ML-DSA-44 sign 205387 cycles 222985 cycles 0.92
ML-DSA-44 verify 72847 cycles 72852 cycles 1.00
ML-DSA-65 keypair 87374 cycles 87368 cycles 1.00
ML-DSA-65 sign 330943 cycles 356080 cycles 0.93
ML-DSA-65 verify 112682 cycles 112690 cycles 1.00
ML-DSA-87 keypair 140125 cycles 140131 cycles 1.00
ML-DSA-87 sign 400915 cycles 425672 cycles 0.94
ML-DSA-87 verify 173325 cycles 173200 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mac Mini (M1, 2020) benchmarks (no-opt)

Benchmark suite Current: 52400e8 Previous: 6a9e7f3 Ratio
ML-DSA-44 keypair 116004 cycles 116031 cycles 1.00
ML-DSA-44 sign 455057 cycles 455193 cycles 1.00
ML-DSA-44 verify 136868 cycles 136876 cycles 1.00
ML-DSA-65 keypair 198307 cycles 198020 cycles 1.00
ML-DSA-65 sign 734932 cycles 733214 cycles 1.00
ML-DSA-65 verify 217197 cycles 216821 cycles 1.00
ML-DSA-87 keypair 335124 cycles 335079 cycles 1.00
ML-DSA-87 sign 915171 cycles 915119 cycles 1.00
ML-DSA-87 verify 353274 cycles 353198 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Mac Mini (M1, 2020) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: aa59090 Previous: 108d2d3 Ratio
ML-DSA-87 keypair 335086 cycles 325071 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)

Benchmark suite Current: 52400e8 Previous: 6a9e7f3 Ratio
ML-DSA-44 keypair 120396 cycles 120280 cycles 1.00
ML-DSA-44 sign 453178 cycles 488378 cycles 0.93
ML-DSA-44 verify 145599 cycles 145600 cycles 1.00
ML-DSA-65 keypair 207581 cycles 207304 cycles 1.00
ML-DSA-65 sign 748100 cycles 802623 cycles 0.93
ML-DSA-65 verify 232038 cycles 231619 cycles 1.00
ML-DSA-87 keypair 336288 cycles 336101 cycles 1.00
ML-DSA-87 sign 928838 cycles 985664 cycles 0.94
ML-DSA-87 verify 370321 cycles 370112 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)

Benchmark suite Current: 52400e8 Previous: 6a9e7f3 Ratio
ML-DSA-44 keypair 215213 cycles 214823 cycles 1.00
ML-DSA-44 sign 796662 cycles 797106 cycles 1.00
ML-DSA-44 verify 239545 cycles 239791 cycles 1.00
ML-DSA-65 keypair 383960 cycles 383443 cycles 1.00
ML-DSA-65 sign 1318169 cycles 1321371 cycles 1.00
ML-DSA-65 verify 385194 cycles 384658 cycles 1.00
ML-DSA-87 keypair 612209 cycles 611086 cycles 1.00
ML-DSA-87 sign 1662843 cycles 1663540 cycles 1.00
ML-DSA-87 verify 637814 cycles 637462 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i)

Benchmark suite Current: 52400e8 Previous: 6a9e7f3 Ratio
ML-DSA-44 keypair 37269 cycles 37391 cycles 1.00
ML-DSA-44 sign 156663 cycles 169666 cycles 0.92
ML-DSA-44 verify 49964 cycles 50078 cycles 1.00
ML-DSA-65 keypair 65965 cycles 66540 cycles 0.99
ML-DSA-65 sign 259219 cycles 279863 cycles 0.93
ML-DSA-65 verify 78658 cycles 78832 cycles 1.00
ML-DSA-87 keypair 101091 cycles 100953 cycles 1.00
ML-DSA-87 sign 309735 cycles 326976 cycles 0.95
ML-DSA-87 verify 117706 cycles 117129 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 4th gen (c7i) (no-opt)

Benchmark suite Current: 52400e8 Previous: 6a9e7f3 Ratio
ML-DSA-44 keypair 96373 cycles 96101 cycles 1.00
ML-DSA-44 sign 352305 cycles 351827 cycles 1.00
ML-DSA-44 verify 105613 cycles 105202 cycles 1.00
ML-DSA-65 keypair 163402 cycles 163467 cycles 1.00
ML-DSA-65 sign 579140 cycles 579623 cycles 1.00
ML-DSA-65 verify 169098 cycles 168943 cycles 1.00
ML-DSA-87 keypair 274423 cycles 273027 cycles 1.01
ML-DSA-87 sign 733987 cycles 733850 cycles 1.00
ML-DSA-87 verify 281558 cycles 281472 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a)

Benchmark suite Current: 52400e8 Previous: 6a9e7f3 Ratio
ML-DSA-44 keypair 77861 cycles 74153 cycles 1.05
ML-DSA-44 sign 257974 cycles 269188 cycles 0.96
ML-DSA-44 verify 93787 cycles 88813 cycles 1.06
ML-DSA-65 keypair 126201 cycles 126040 cycles 1.00
ML-DSA-65 sign 398767 cycles 432983 cycles 0.92
ML-DSA-65 verify 142169 cycles 141947 cycles 1.00
ML-DSA-87 keypair 210677 cycles 212859 cycles 0.99
ML-DSA-87 sign 506966 cycles 548118 cycles 0.92
ML-DSA-87 verify 229048 cycles 231598 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i)

Benchmark suite Current: 52400e8 Previous: 6a9e7f3 Ratio
ML-DSA-44 keypair 60996 cycles 60934 cycles 1.00
ML-DSA-44 sign 240511 cycles 264765 cycles 0.91
ML-DSA-44 verify 80753 cycles 80601 cycles 1.00
ML-DSA-65 keypair 108711 cycles 106939 cycles 1.02
ML-DSA-65 sign 404832 cycles 437635 cycles 0.93
ML-DSA-65 verify 129067 cycles 127465 cycles 1.01
ML-DSA-87 keypair 163491 cycles 166545 cycles 0.98
ML-DSA-87 sign 470318 cycles 515721 cycles 0.91
ML-DSA-87 verify 189408 cycles 192786 cycles 0.98

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a)

Benchmark suite Current: 52400e8 Previous: 6a9e7f3 Ratio
ML-DSA-44 keypair 44285 cycles 44398 cycles 1.00
ML-DSA-44 sign 177097 cycles 197053 cycles 0.90
ML-DSA-44 verify 60289 cycles 60543 cycles 1.00
ML-DSA-65 keypair 75868 cycles 75922 cycles 1.00
ML-DSA-65 sign 282829 cycles 319168 cycles 0.89
ML-DSA-65 verify 94066 cycles 93010 cycles 1.01
ML-DSA-87 keypair 116110 cycles 115966 cycles 1.00
ML-DSA-87 sign 335125 cycles 369577 cycles 0.91
ML-DSA-87 verify 142087 cycles 139480 cycles 1.02

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'AMD EPYC 4th gen (c7a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 0f865ed Previous: 108d2d3 Ratio
ML-DSA-87 keypair 119449 cycles 115550 cycles 1.03
ML-DSA-87 verify 141848 cycles 136834 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 3rd gen (c6a) (no-opt)

Benchmark suite Current: 52400e8 Previous: 6a9e7f3 Ratio
ML-DSA-44 keypair 136442 cycles 136765 cycles 1.00
ML-DSA-44 sign 552709 cycles 555020 cycles 1.00
ML-DSA-44 verify 154569 cycles 154891 cycles 1.00
ML-DSA-65 keypair 227676 cycles 227905 cycles 1.00
ML-DSA-65 sign 891454 cycles 892504 cycles 1.00
ML-DSA-65 verify 243542 cycles 244076 cycles 1.00
ML-DSA-87 keypair 376652 cycles 376770 cycles 1.00
ML-DSA-87 sign 1115251 cycles 1115378 cycles 1.00
ML-DSA-87 verify 398878 cycles 398605 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)

Benchmark suite Current: 52400e8 Previous: 6a9e7f3 Ratio
ML-DSA-44 keypair 298656 cycles 300219 cycles 0.99
ML-DSA-44 sign 1132000 cycles 1236451 cycles 0.92
ML-DSA-44 verify 361791 cycles 357732 cycles 1.01
ML-DSA-65 keypair 507102 cycles 507848 cycles 1.00
ML-DSA-65 sign 1873378 cycles 2011617 cycles 0.93
ML-DSA-65 verify 558065 cycles 558683 cycles 1.00
ML-DSA-87 keypair 860140 cycles 872050 cycles 0.99
ML-DSA-87 sign 2471030 cycles 2624877 cycles 0.94
ML-DSA-87 verify 931913 cycles 938240 cycles 0.99

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4

Benchmark suite Current: 52400e8 Previous: 6a9e7f3 Ratio
ML-DSA-44 keypair 73090 cycles 72904 cycles 1.00
ML-DSA-44 sign 262009 cycles 282731 cycles 0.93
ML-DSA-44 verify 87170 cycles 87111 cycles 1.00
ML-DSA-65 keypair 129023 cycles 128483 cycles 1.00
ML-DSA-65 sign 431495 cycles 460937 cycles 0.94
ML-DSA-65 verify 139521 cycles 139107 cycles 1.00
ML-DSA-87 keypair 208590 cycles 207855 cycles 1.00
ML-DSA-87 sign 536383 cycles 563388 cycles 0.95
ML-DSA-87 verify 223108 cycles 222542 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intel Xeon 3rd gen (c6i) (no-opt)

Benchmark suite Current: 52400e8 Previous: 6a9e7f3 Ratio
ML-DSA-44 keypair 159416 cycles 159431 cycles 1.00
ML-DSA-44 sign 576923 cycles 574205 cycles 1.00
ML-DSA-44 verify 175501 cycles 175122 cycles 1.00
ML-DSA-65 keypair 271891 cycles 272121 cycles 1.00
ML-DSA-65 sign 945450 cycles 950970 cycles 0.99
ML-DSA-65 verify 283115 cycles 283406 cycles 1.00
ML-DSA-87 keypair 454217 cycles 453040 cycles 1.00
ML-DSA-87 sign 1196506 cycles 1194595 cycles 1.00
ML-DSA-87 verify 471158 cycles 469649 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AMD EPYC 4th gen (c7a) (no-opt)

Benchmark suite Current: 52400e8 Previous: 6a9e7f3 Ratio
ML-DSA-44 keypair 123330 cycles 121454 cycles 1.02
ML-DSA-44 sign 469492 cycles 467420 cycles 1.00
ML-DSA-44 verify 139153 cycles 136935 cycles 1.02
ML-DSA-65 keypair 207295 cycles 206019 cycles 1.01
ML-DSA-65 sign 751087 cycles 753924 cycles 1.00
ML-DSA-65 verify 216892 cycles 217888 cycles 1.00
ML-DSA-87 keypair 341909 cycles 341986 cycles 1.00
ML-DSA-87 sign 958782 cycles 952984 cycles 1.01
ML-DSA-87 verify 358041 cycles 357724 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton4 (no-opt)

Benchmark suite Current: 52400e8 Previous: 6a9e7f3 Ratio
ML-DSA-44 keypair 134759 cycles 134706 cycles 1.00
ML-DSA-44 sign 507755 cycles 508655 cycles 1.00
ML-DSA-44 verify 149495 cycles 149594 cycles 1.00
ML-DSA-65 keypair 228672 cycles 228403 cycles 1.00
ML-DSA-65 sign 825573 cycles 824225 cycles 1.00
ML-DSA-65 verify 237518 cycles 237248 cycles 1.00
ML-DSA-87 keypair 377149 cycles 377391 cycles 1.00
ML-DSA-87 sign 1031536 cycles 1030599 cycles 1.00
ML-DSA-87 verify 391370 cycles 390950 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3

Benchmark suite Current: 52400e8 Previous: 6a9e7f3 Ratio
ML-DSA-44 keypair 78014 cycles 77990 cycles 1.00
ML-DSA-44 sign 282225 cycles 304149 cycles 0.93
ML-DSA-44 verify 95386 cycles 95539 cycles 1.00
ML-DSA-65 keypair 135088 cycles 134961 cycles 1.00
ML-DSA-65 sign 464959 cycles 496716 cycles 0.94
ML-DSA-65 verify 151277 cycles 151179 cycles 1.00
ML-DSA-87 keypair 217758 cycles 217516 cycles 1.00
ML-DSA-87 sign 574031 cycles 606523 cycles 0.95
ML-DSA-87 verify 239653 cycles 239624 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2

Benchmark suite Current: 52400e8 Previous: 6a9e7f3 Ratio
ML-DSA-44 keypair 120494 cycles 120913 cycles 1.00
ML-DSA-44 sign 453588 cycles 489433 cycles 0.93
ML-DSA-44 verify 145750 cycles 146130 cycles 1.00
ML-DSA-65 keypair 207626 cycles 207543 cycles 1.00
ML-DSA-65 sign 748654 cycles 802552 cycles 0.93
ML-DSA-65 verify 232034 cycles 231747 cycles 1.00
ML-DSA-87 keypair 336402 cycles 336678 cycles 1.00
ML-DSA-87 sign 929978 cycles 986239 cycles 0.94
ML-DSA-87 verify 370007 cycles 370226 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton3 (no-opt)

Benchmark suite Current: 52400e8 Previous: 6a9e7f3 Ratio
ML-DSA-44 keypair 139781 cycles 139732 cycles 1.00
ML-DSA-44 sign 505653 cycles 505480 cycles 1.00
ML-DSA-44 verify 154443 cycles 154400 cycles 1.00
ML-DSA-65 keypair 245094 cycles 245192 cycles 1.00
ML-DSA-65 sign 824884 cycles 824410 cycles 1.00
ML-DSA-65 verify 248346 cycles 248701 cycles 1.00
ML-DSA-87 keypair 397665 cycles 397288 cycles 1.00
ML-DSA-87 sign 1044177 cycles 1043636 cycles 1.00
ML-DSA-87 verify 411659 cycles 411303 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)

Benchmark suite Current: 52400e8 Previous: 6a9e7f3 Ratio
ML-DSA-44 keypair 469026 cycles 468881 cycles 1.00
ML-DSA-44 sign 2238512 cycles 2248871 cycles 1.00
ML-DSA-44 verify 565236 cycles 561836 cycles 1.01
ML-DSA-65 keypair 782408 cycles 783629 cycles 1.00
ML-DSA-65 sign 3656126 cycles 3666008 cycles 1.00
ML-DSA-65 verify 871868 cycles 868783 cycles 1.00
ML-DSA-87 keypair 1263276 cycles 1260624 cycles 1.00
ML-DSA-87 sign 4518842 cycles 4518541 cycles 1.00
ML-DSA-87 verify 1396696 cycles 1389881 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)

Benchmark suite Current: 52400e8 Previous: 6a9e7f3 Ratio
ML-DSA-44 keypair 828705 cycles 828183 cycles 1.00
ML-DSA-44 sign 3365620 cycles 3364011 cycles 1.00
ML-DSA-44 verify 932132 cycles 931608 cycles 1.00
ML-DSA-65 keypair 1394312 cycles 1394183 cycles 1.00
ML-DSA-65 sign 5485228 cycles 5487573 cycles 1.00
ML-DSA-65 verify 1482004 cycles 1481552 cycles 1.00
ML-DSA-87 keypair 2310465 cycles 2311843 cycles 1.00
ML-DSA-87 sign 6903276 cycles 6920584 cycles 1.00
ML-DSA-87 verify 2434780 cycles 2439612 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graviton2 (no-opt)

Benchmark suite Current: 52400e8 Previous: 6a9e7f3 Ratio
ML-DSA-44 keypair 215483 cycles 215740 cycles 1.00
ML-DSA-44 sign 797478 cycles 811168 cycles 0.98
ML-DSA-44 verify 239781 cycles 240054 cycles 1.00
ML-DSA-65 keypair 384013 cycles 383947 cycles 1.00
ML-DSA-65 sign 1314460 cycles 1313699 cycles 1.00
ML-DSA-65 verify 385281 cycles 385250 cycles 1.00
ML-DSA-87 keypair 612362 cycles 611735 cycles 1.00
ML-DSA-87 sign 1664503 cycles 1666380 cycles 1.00
ML-DSA-87 verify 637504 cycles 637464 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)

Benchmark suite Current: 52400e8 Previous: 6a9e7f3 Ratio
ML-DSA-44 keypair 239349 cycles 235072 cycles 1.02
ML-DSA-44 sign 804618 cycles 805610 cycles 1.00
ML-DSA-44 verify 278308 cycles 261977 cycles 1.06
ML-DSA-65 keypair 420853 cycles 413165 cycles 1.02
ML-DSA-65 sign 1327698 cycles 1302032 cycles 1.02
ML-DSA-65 verify 443842 cycles 414961 cycles 1.07
ML-DSA-87 keypair 700951 cycles 668773 cycles 1.05
ML-DSA-87 sign 1768601 cycles 1711807 cycles 1.03
ML-DSA-87 verify 735622 cycles 694958 cycles 1.06

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)

Benchmark suite Current: 52400e8 Previous: 6a9e7f3 Ratio
ML-DSA-44 keypair 320793 cycles 306364 cycles 1.05
ML-DSA-44 sign 1238813 cycles 1229045 cycles 1.01
ML-DSA-44 verify 345021 cycles 350761 cycles 0.98
ML-DSA-65 keypair 592326 cycles 561430 cycles 1.06
ML-DSA-65 sign 2033937 cycles 2011765 cycles 1.01
ML-DSA-65 verify 567523 cycles 547519 cycles 1.04
ML-DSA-87 keypair 871908 cycles 868065 cycles 1.00
ML-DSA-87 sign 2516653 cycles 2537481 cycles 0.99
ML-DSA-87 verify 904549 cycles 901328 cycles 1.00

This comment was automatically generated by workflow using github-action-benchmark.

@mkannwischer mkannwischer force-pushed the decompose-asm branch 2 times, most recently from a78441c to ae4922c Compare August 4, 2025 10:17
Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Graviton4'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: aa59090 Previous: 108d2d3 Ratio
ML-DSA-44 keypair 77432 cycles 73120 cycles 1.06
ML-DSA-44 verify 90923 cycles 87104 cycles 1.04
ML-DSA-65 keypair 135321 cycles 128241 cycles 1.06
ML-DSA-65 verify 146013 cycles 138989 cycles 1.05
ML-DSA-87 keypair 221150 cycles 207784 cycles 1.06
ML-DSA-87 verify 236143 cycles 222452 cycles 1.06

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Graviton4 (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: aa59090 Previous: 108d2d3 Ratio
ML-DSA-65 sign 845214 cycles 819792 cycles 1.03

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 52400e8 Previous: 6a9e7f3 Ratio
ML-DSA-44 keypair 320793 cycles 306364 cycles 1.05
ML-DSA-65 keypair 592326 cycles 561430 cycles 1.06
ML-DSA-65 verify 567523 cycles 547519 cycles 1.04

This comment was automatically generated by workflow using github-action-benchmark.

mldsa/poly.c Outdated
/* TODO: proof */
mld_assert_bound(a->coeffs, MLDSA_N, 0, MLDSA_Q);
mld_poly_decompose_32_native(a1->coeffs, a0->coeffs, a->coeffs);
#else /* !None && MLD_USE_NATIVE_POLY_DECOMPOSE_32 && (MLDSA_MODE == 3 || \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is going wrong with autogen here, the first conditional does not seem to be recognized

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Can you take another look please?

This adds the AVX2 intrinsics implementation of poly_decompose from
https://github.com/pq-crystals/dilithium/blob/master/avx2/rounding.c.

Resolves #399.

Signed-off-by: Matthias J. Kannwischer <[email protected]>
This add a native implementation of poly_decompose written from scratch.

Resolves #397

Signed-off-by: Matthias J. Kannwischer <[email protected]>

// Step 2: Barrett reduction with rounding: round(temp * 1025 / 2^22)
// This computes: round(ceil(a/128) / 4092)
// Combined: a1 ≈ round(ceil(a/128) / 4092) ≈ floor(a / 523776)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the second equality hold?

// which is equivalent to: (temp * 1025 + 2^21) >> 22.
sqrdmulh \a1\().4s, \a1\().4s, barrett_const.4s

// Step 3: Mask to valid range [0, 14] since (Q-1)/(2*GAMMA2) = 15
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment would suggest that 15 is excluded after the masking, but it isn't, seeing that mask_15 is elementwise0x0F?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the comment is wrong. the valid range is [0, 15]. will fix.

Copy link

@oqs-bot oqs-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'AMD EPYC 3rd gen (c6a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 52400e8 Previous: 6a9e7f3 Ratio
ML-DSA-44 keypair 77861 cycles 74153 cycles 1.05
ML-DSA-44 verify 93787 cycles 88813 cycles 1.06

This comment was automatically generated by workflow using github-action-benchmark.

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Performance Alert ⚠️

Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03.

Benchmark suite Current: 52400e8 Previous: 6a9e7f3 Ratio
ML-DSA-44 verify 278308 cycles 261977 cycles 1.06
ML-DSA-65 verify 443842 cycles 414961 cycles 1.07
ML-DSA-87 keypair 700951 cycles 668773 cycles 1.05
ML-DSA-87 sign 1768601 cycles 1711807 cycles 1.03
ML-DSA-87 verify 735622 cycles 694958 cycles 1.06

This comment was automatically generated by workflow using github-action-benchmark.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AVX2: Consider adding poly_decompose assembly AArch64: Consider adding poly_decompose assembly
3 participants