-
Notifications
You must be signed in to change notification settings - Fork 23
Add native implementation for poly_caddq
#414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
sshr tmp.4s, \inout\().4s, #31 | ||
mls \inout\().4s, tmp.4s, q_reg.4s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: shr
+ mla
would be a bit more intuitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense to wait for benchmarks first, but if we want this, the avx2 code should be written in ASM, not intrinsics.
In that case we will have to split this up into two PRs. I'll do AArch64, but we will need to find someone who can write AVX2 asm. Wouldn't it make more sense to get performance on-par with the pqcrystals implementation first and then rewrite it to asm later or in parallel? The integration of the existing avx2 intrinsics implementation is straightforward. |
db2e505
to
a443fd7
Compare
This commit is hoisted out from #325. It adds only the native implementation of poly_caddq taken from the official AVX2 implementation. Signed-off-by: Matthias J. Kannwischer <[email protected]>
Signed-off-by: Matthias J. Kannwischer <[email protected]>
a443fd7
to
d45408b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks (opt)
Benchmark suite | Current: d45408b | Previous: ebede4d | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
47851 cycles |
49574 cycles |
0.97 |
ML-DSA-44 sign |
173145 cycles |
180472 cycles |
0.96 |
ML-DSA-44 verify |
56651 cycles |
58524 cycles |
0.97 |
ML-DSA-65 keypair |
83660 cycles |
86323 cycles |
0.97 |
ML-DSA-65 sign |
284598 cycles |
298570 cycles |
0.95 |
ML-DSA-65 verify |
91148 cycles |
93819 cycles |
0.97 |
ML-DSA-87 keypair |
136114 cycles |
139768 cycles |
0.97 |
ML-DSA-87 sign |
353910 cycles |
368794 cycles |
0.96 |
ML-DSA-87 verify |
145304 cycles |
148973 cycles |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks (no-opt)
Benchmark suite | Current: d45408b | Previous: ebede4d | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
115056 cycles |
115045 cycles |
1.00 |
ML-DSA-44 sign |
430540 cycles |
430523 cycles |
1.00 |
ML-DSA-44 verify |
122149 cycles |
122136 cycles |
1.00 |
ML-DSA-65 keypair |
197455 cycles |
197476 cycles |
1.00 |
ML-DSA-65 sign |
701054 cycles |
701059 cycles |
1.00 |
ML-DSA-65 verify |
197629 cycles |
197658 cycles |
1.00 |
ML-DSA-87 keypair |
334609 cycles |
334653 cycles |
1.00 |
ML-DSA-87 sign |
883871 cycles |
883903 cycles |
1.00 |
ML-DSA-87 verify |
328482 cycles |
328478 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)
Benchmark suite | Current: d45408b | Previous: ebede4d | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
115259 cycles |
118636 cycles |
0.97 |
ML-DSA-44 sign |
421458 cycles |
433976 cycles |
0.97 |
ML-DSA-44 verify |
132918 cycles |
136098 cycles |
0.98 |
ML-DSA-65 keypair |
200140 cycles |
204644 cycles |
0.98 |
ML-DSA-65 sign |
698510 cycles |
722863 cycles |
0.97 |
ML-DSA-65 verify |
214500 cycles |
219020 cycles |
0.98 |
ML-DSA-87 keypair |
328037 cycles |
333765 cycles |
0.98 |
ML-DSA-87 sign |
878355 cycles |
902010 cycles |
0.97 |
ML-DSA-87 verify |
346241 cycles |
354166 cycles |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)
Benchmark suite | Current: d45408b | Previous: ebede4d | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
212904 cycles |
213088 cycles |
1.00 |
ML-DSA-44 sign |
781158 cycles |
780956 cycles |
1.00 |
ML-DSA-44 verify |
229596 cycles |
230268 cycles |
1.00 |
ML-DSA-65 keypair |
380703 cycles |
380342 cycles |
1.00 |
ML-DSA-65 sign |
1283469 cycles |
1291576 cycles |
0.99 |
ML-DSA-65 verify |
372123 cycles |
371822 cycles |
1.00 |
ML-DSA-87 keypair |
608948 cycles |
608922 cycles |
1.00 |
ML-DSA-87 sign |
1642584 cycles |
1642217 cycles |
1.00 |
ML-DSA-87 verify |
621386 cycles |
621406 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i)
Benchmark suite | Current: d45408b | Previous: ebede4d | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
35025 cycles |
36914 cycles |
0.95 |
ML-DSA-44 sign |
143273 cycles |
150193 cycles |
0.95 |
ML-DSA-44 verify |
44283 cycles |
45714 cycles |
0.97 |
ML-DSA-65 keypair |
63469 cycles |
65371 cycles |
0.97 |
ML-DSA-65 sign |
239729 cycles |
249917 cycles |
0.96 |
ML-DSA-65 verify |
71212 cycles |
73553 cycles |
0.97 |
ML-DSA-87 keypair |
96766 cycles |
98755 cycles |
0.98 |
ML-DSA-87 sign |
289753 cycles |
297831 cycles |
0.97 |
ML-DSA-87 verify |
107039 cycles |
109480 cycles |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i) (no-opt)
Benchmark suite | Current: d45408b | Previous: ebede4d | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
95141 cycles |
95116 cycles |
1.00 |
ML-DSA-44 sign |
342977 cycles |
343293 cycles |
1.00 |
ML-DSA-44 verify |
101248 cycles |
101310 cycles |
1.00 |
ML-DSA-65 keypair |
164948 cycles |
164688 cycles |
1.00 |
ML-DSA-65 sign |
572030 cycles |
571928 cycles |
1.00 |
ML-DSA-65 verify |
163929 cycles |
163880 cycles |
1.00 |
ML-DSA-87 keypair |
269043 cycles |
268822 cycles |
1.00 |
ML-DSA-87 sign |
726084 cycles |
725612 cycles |
1.00 |
ML-DSA-87 verify |
271795 cycles |
271457 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a)
Benchmark suite | Current: d45408b | Previous: ebede4d | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
73678 cycles |
72429 cycles |
1.02 |
ML-DSA-44 sign |
238042 cycles |
235885 cycles |
1.01 |
ML-DSA-44 verify |
83424 cycles |
81935 cycles |
1.02 |
ML-DSA-65 keypair |
122457 cycles |
127155 cycles |
0.96 |
ML-DSA-65 sign |
372858 cycles |
392445 cycles |
0.95 |
ML-DSA-65 verify |
131265 cycles |
135767 cycles |
0.97 |
ML-DSA-87 keypair |
207061 cycles |
209507 cycles |
0.99 |
ML-DSA-87 sign |
479289 cycles |
490112 cycles |
0.98 |
ML-DSA-87 verify |
215926 cycles |
218227 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a)
Benchmark suite | Current: d45408b | Previous: ebede4d | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
42278 cycles |
43749 cycles |
0.97 |
ML-DSA-44 sign |
160044 cycles |
168465 cycles |
0.95 |
ML-DSA-44 verify |
53629 cycles |
55266 cycles |
0.97 |
ML-DSA-65 keypair |
71741 cycles |
74727 cycles |
0.96 |
ML-DSA-65 sign |
258565 cycles |
269433 cycles |
0.96 |
ML-DSA-65 verify |
83358 cycles |
85290 cycles |
0.98 |
ML-DSA-87 keypair |
111912 cycles |
114492 cycles |
0.98 |
ML-DSA-87 sign |
309232 cycles |
319148 cycles |
0.97 |
ML-DSA-87 verify |
128545 cycles |
129750 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i)
Benchmark suite | Current: d45408b | Previous: ebede4d | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
57459 cycles |
59245 cycles |
0.97 |
ML-DSA-44 sign |
224603 cycles |
231008 cycles |
0.97 |
ML-DSA-44 verify |
72810 cycles |
74684 cycles |
0.97 |
ML-DSA-65 keypair |
104183 cycles |
104580 cycles |
1.00 |
ML-DSA-65 sign |
375420 cycles |
383421 cycles |
0.98 |
ML-DSA-65 verify |
117504 cycles |
118758 cycles |
0.99 |
ML-DSA-87 keypair |
157854 cycles |
162893 cycles |
0.97 |
ML-DSA-87 sign |
443245 cycles |
464184 cycles |
0.95 |
ML-DSA-87 verify |
176039 cycles |
180395 cycles |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3
Benchmark suite | Current: d45408b | Previous: ebede4d | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
73991 cycles |
76425 cycles |
0.97 |
ML-DSA-44 sign |
258818 cycles |
269792 cycles |
0.96 |
ML-DSA-44 verify |
86840 cycles |
89274 cycles |
0.97 |
ML-DSA-65 keypair |
129767 cycles |
133670 cycles |
0.97 |
ML-DSA-65 sign |
428121 cycles |
448377 cycles |
0.95 |
ML-DSA-65 verify |
139976 cycles |
144053 cycles |
0.97 |
ML-DSA-87 keypair |
210843 cycles |
216151 cycles |
0.98 |
ML-DSA-87 sign |
537712 cycles |
558494 cycles |
0.96 |
ML-DSA-87 verify |
225089 cycles |
230636 cycles |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a) (no-opt)
Benchmark suite | Current: d45408b | Previous: ebede4d | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
135633 cycles |
135562 cycles |
1.00 |
ML-DSA-44 sign |
544428 cycles |
542966 cycles |
1.00 |
ML-DSA-44 verify |
148691 cycles |
148680 cycles |
1.00 |
ML-DSA-65 keypair |
227822 cycles |
227876 cycles |
1.00 |
ML-DSA-65 sign |
876183 cycles |
873284 cycles |
1.00 |
ML-DSA-65 verify |
236729 cycles |
236062 cycles |
1.00 |
ML-DSA-87 keypair |
374848 cycles |
375435 cycles |
1.00 |
ML-DSA-87 sign |
1097932 cycles |
1099713 cycles |
1.00 |
ML-DSA-87 verify |
386699 cycles |
387716 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)
Benchmark suite | Current: d45408b | Previous: ebede4d | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
282186 cycles |
289927 cycles |
0.97 |
ML-DSA-44 sign |
1059056 cycles |
1093182 cycles |
0.97 |
ML-DSA-44 verify |
333541 cycles |
342018 cycles |
0.98 |
ML-DSA-65 keypair |
483586 cycles |
494445 cycles |
0.98 |
ML-DSA-65 sign |
1741574 cycles |
1801085 cycles |
0.97 |
ML-DSA-65 verify |
525769 cycles |
536077 cycles |
0.98 |
ML-DSA-87 keypair |
813340 cycles |
841573 cycles |
0.97 |
ML-DSA-87 sign |
2285185 cycles |
2406369 cycles |
0.95 |
ML-DSA-87 verify |
867799 cycles |
899537 cycles |
0.96 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a) (no-opt)
Benchmark suite | Current: d45408b | Previous: ebede4d | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
120986 cycles |
120436 cycles |
1.00 |
ML-DSA-44 sign |
454793 cycles |
452210 cycles |
1.01 |
ML-DSA-44 verify |
131296 cycles |
131572 cycles |
1.00 |
ML-DSA-65 keypair |
205285 cycles |
204813 cycles |
1.00 |
ML-DSA-65 sign |
737169 cycles |
741495 cycles |
0.99 |
ML-DSA-65 verify |
210869 cycles |
210092 cycles |
1.00 |
ML-DSA-87 keypair |
340074 cycles |
345097 cycles |
0.99 |
ML-DSA-87 sign |
946189 cycles |
954462 cycles |
0.99 |
ML-DSA-87 verify |
351857 cycles |
353756 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i) (no-opt)
Benchmark suite | Current: d45408b | Previous: ebede4d | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
157285 cycles |
157128 cycles |
1.00 |
ML-DSA-44 sign |
564128 cycles |
563242 cycles |
1.00 |
ML-DSA-44 verify |
169109 cycles |
168817 cycles |
1.00 |
ML-DSA-65 keypair |
269435 cycles |
268889 cycles |
1.00 |
ML-DSA-65 sign |
928431 cycles |
929005 cycles |
1.00 |
ML-DSA-65 verify |
273935 cycles |
274165 cycles |
1.00 |
ML-DSA-87 keypair |
451404 cycles |
450904 cycles |
1.00 |
ML-DSA-87 sign |
1181071 cycles |
1182837 cycles |
1.00 |
ML-DSA-87 verify |
460399 cycles |
460036 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4
Benchmark suite | Current: d45408b | Previous: ebede4d | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
69485 cycles |
71384 cycles |
0.97 |
ML-DSA-44 sign |
243406 cycles |
252411 cycles |
0.96 |
ML-DSA-44 verify |
80510 cycles |
82570 cycles |
0.98 |
ML-DSA-65 keypair |
123278 cycles |
126207 cycles |
0.98 |
ML-DSA-65 sign |
402262 cycles |
418729 cycles |
0.96 |
ML-DSA-65 verify |
130784 cycles |
133572 cycles |
0.98 |
ML-DSA-87 keypair |
200823 cycles |
204935 cycles |
0.98 |
ML-DSA-87 sign |
508031 cycles |
524564 cycles |
0.97 |
ML-DSA-87 verify |
211930 cycles |
215419 cycles |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2
Benchmark suite | Current: d45408b | Previous: ebede4d | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
115972 cycles |
118893 cycles |
0.98 |
ML-DSA-44 sign |
423405 cycles |
434340 cycles |
0.97 |
ML-DSA-44 verify |
133762 cycles |
136351 cycles |
0.98 |
ML-DSA-65 keypair |
200328 cycles |
204738 cycles |
0.98 |
ML-DSA-65 sign |
698660 cycles |
723474 cycles |
0.97 |
ML-DSA-65 verify |
214506 cycles |
219065 cycles |
0.98 |
ML-DSA-87 keypair |
328454 cycles |
334351 cycles |
0.98 |
ML-DSA-87 sign |
879954 cycles |
903543 cycles |
0.97 |
ML-DSA-87 verify |
346835 cycles |
354365 cycles |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3 (no-opt)
Benchmark suite | Current: d45408b | Previous: ebede4d | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
138587 cycles |
138687 cycles |
1.00 |
ML-DSA-44 sign |
494937 cycles |
495442 cycles |
1.00 |
ML-DSA-44 verify |
148834 cycles |
148734 cycles |
1.00 |
ML-DSA-65 keypair |
242313 cycles |
242497 cycles |
1.00 |
ML-DSA-65 sign |
809930 cycles |
809720 cycles |
1.00 |
ML-DSA-65 verify |
240840 cycles |
241177 cycles |
1.00 |
ML-DSA-87 keypair |
396223 cycles |
396469 cycles |
1.00 |
ML-DSA-87 sign |
1031619 cycles |
1031355 cycles |
1.00 |
ML-DSA-87 verify |
402491 cycles |
402148 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4 (no-opt)
Benchmark suite | Current: d45408b | Previous: ebede4d | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
133152 cycles |
133059 cycles |
1.00 |
ML-DSA-44 sign |
498192 cycles |
498254 cycles |
1.00 |
ML-DSA-44 verify |
144942 cycles |
145034 cycles |
1.00 |
ML-DSA-65 keypair |
225991 cycles |
226138 cycles |
1.00 |
ML-DSA-65 sign |
814388 cycles |
813372 cycles |
1.00 |
ML-DSA-65 verify |
231482 cycles |
231336 cycles |
1.00 |
ML-DSA-87 keypair |
374634 cycles |
374386 cycles |
1.00 |
ML-DSA-87 sign |
1019911 cycles |
1020865 cycles |
1.00 |
ML-DSA-87 verify |
383528 cycles |
383441 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2 (no-opt)
Benchmark suite | Current: d45408b | Previous: ebede4d | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
213348 cycles |
213315 cycles |
1.00 |
ML-DSA-44 sign |
782920 cycles |
793744 cycles |
0.99 |
ML-DSA-44 verify |
230036 cycles |
229936 cycles |
1.00 |
ML-DSA-65 keypair |
381290 cycles |
380736 cycles |
1.00 |
ML-DSA-65 sign |
1285487 cycles |
1284870 cycles |
1.00 |
ML-DSA-65 verify |
372653 cycles |
372305 cycles |
1.00 |
ML-DSA-87 keypair |
609534 cycles |
609775 cycles |
1.00 |
ML-DSA-87 sign |
1645467 cycles |
1645982 cycles |
1.00 |
ML-DSA-87 verify |
621701 cycles |
621716 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)
Benchmark suite | Current: d45408b | Previous: ebede4d | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
460565 cycles |
461887 cycles |
1.00 |
ML-DSA-44 sign |
2209645 cycles |
2208687 cycles |
1.00 |
ML-DSA-44 verify |
545435 cycles |
546011 cycles |
1.00 |
ML-DSA-65 keypair |
774964 cycles |
772961 cycles |
1.00 |
ML-DSA-65 sign |
3633431 cycles |
3613853 cycles |
1.01 |
ML-DSA-65 verify |
846679 cycles |
846000 cycles |
1.00 |
ML-DSA-87 keypair |
1250314 cycles |
1252087 cycles |
1.00 |
ML-DSA-87 sign |
4526553 cycles |
4485312 cycles |
1.01 |
ML-DSA-87 verify |
1363008 cycles |
1363580 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)
Benchmark suite | Current: d45408b | Previous: ebede4d | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
823259 cycles |
822837 cycles |
1.00 |
ML-DSA-44 sign |
3331843 cycles |
3330856 cycles |
1.00 |
ML-DSA-44 verify |
919486 cycles |
918759 cycles |
1.00 |
ML-DSA-65 keypair |
1404693 cycles |
1401325 cycles |
1.00 |
ML-DSA-65 sign |
5455368 cycles |
5449246 cycles |
1.00 |
ML-DSA-65 verify |
1469178 cycles |
1466794 cycles |
1.00 |
ML-DSA-87 keypair |
2297073 cycles |
2299513 cycles |
1.00 |
ML-DSA-87 sign |
6797279 cycles |
6802332 cycles |
1.00 |
ML-DSA-87 verify |
2399877 cycles |
2394302 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)
Benchmark suite | Current: d45408b | Previous: ebede4d | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
219607 cycles |
259879 cycles |
0.85 |
ML-DSA-44 sign |
709012 cycles |
796628 cycles |
0.89 |
ML-DSA-44 verify |
243564 cycles |
257672 cycles |
0.95 |
ML-DSA-65 keypair |
392968 cycles |
418565 cycles |
0.94 |
ML-DSA-65 sign |
1182376 cycles |
1261720 cycles |
0.94 |
ML-DSA-65 verify |
399468 cycles |
415437 cycles |
0.96 |
ML-DSA-87 keypair |
665448 cycles |
668630 cycles |
1.00 |
ML-DSA-87 sign |
1574717 cycles |
1597546 cycles |
0.99 |
ML-DSA-87 verify |
663926 cycles |
670039 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)
Benchmark suite | Current: d45408b | Previous: ebede4d | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
302847 cycles |
311403 cycles |
0.97 |
ML-DSA-44 sign |
1222085 cycles |
1172791 cycles |
1.04 |
ML-DSA-44 verify |
338643 cycles |
325419 cycles |
1.04 |
ML-DSA-65 keypair |
565947 cycles |
556523 cycles |
1.02 |
ML-DSA-65 sign |
1990741 cycles |
1925569 cycles |
1.03 |
ML-DSA-65 verify |
539294 cycles |
527641 cycles |
1.02 |
ML-DSA-87 keypair |
872061 cycles |
867077 cycles |
1.01 |
ML-DSA-87 sign |
2540499 cycles |
2461379 cycles |
1.03 |
ML-DSA-87 verify |
902482 cycles |
883514 cycles |
1.02 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03
.
Benchmark suite | Current: d45408b | Previous: ebede4d | Ratio |
---|---|---|---|
ML-DSA-44 sign |
1222085 cycles |
1172791 cycles |
1.04 |
ML-DSA-44 verify |
338643 cycles |
325419 cycles |
1.04 |
ML-DSA-65 sign |
1990741 cycles |
1925569 cycles |
1.03 |
ML-DSA-87 sign |
2540499 cycles |
2461379 cycles |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
poly_caddq
assembly #396poly_caddq
#327poly_decompose
#411