-
Notifications
You must be signed in to change notification settings - Fork 23
Add native implementation of poly_decompose
#411
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
76fe175
to
0f865ed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks (opt)
Benchmark suite | Current: 52400e8 | Previous: 6a9e7f3 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
50481 cycles |
50494 cycles |
1.00 |
ML-DSA-44 sign |
205387 cycles |
222985 cycles |
0.92 |
ML-DSA-44 verify |
72847 cycles |
72852 cycles |
1.00 |
ML-DSA-65 keypair |
87374 cycles |
87368 cycles |
1.00 |
ML-DSA-65 sign |
330943 cycles |
356080 cycles |
0.93 |
ML-DSA-65 verify |
112682 cycles |
112690 cycles |
1.00 |
ML-DSA-87 keypair |
140125 cycles |
140131 cycles |
1.00 |
ML-DSA-87 sign |
400915 cycles |
425672 cycles |
0.94 |
ML-DSA-87 verify |
173325 cycles |
173200 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks (no-opt)
Benchmark suite | Current: 52400e8 | Previous: 6a9e7f3 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
116004 cycles |
116031 cycles |
1.00 |
ML-DSA-44 sign |
455057 cycles |
455193 cycles |
1.00 |
ML-DSA-44 verify |
136868 cycles |
136876 cycles |
1.00 |
ML-DSA-65 keypair |
198307 cycles |
198020 cycles |
1.00 |
ML-DSA-65 sign |
734932 cycles |
733214 cycles |
1.00 |
ML-DSA-65 verify |
217197 cycles |
216821 cycles |
1.00 |
ML-DSA-87 keypair |
335124 cycles |
335079 cycles |
1.00 |
ML-DSA-87 sign |
915171 cycles |
915119 cycles |
1.00 |
ML-DSA-87 verify |
353274 cycles |
353198 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Mac Mini (M1, 2020) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03
.
Benchmark suite | Current: aa59090 | Previous: 108d2d3 | Ratio |
---|---|---|---|
ML-DSA-87 keypair |
335086 cycles |
325071 cycles |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)
Benchmark suite | Current: 52400e8 | Previous: 6a9e7f3 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
120396 cycles |
120280 cycles |
1.00 |
ML-DSA-44 sign |
453178 cycles |
488378 cycles |
0.93 |
ML-DSA-44 verify |
145599 cycles |
145600 cycles |
1.00 |
ML-DSA-65 keypair |
207581 cycles |
207304 cycles |
1.00 |
ML-DSA-65 sign |
748100 cycles |
802623 cycles |
0.93 |
ML-DSA-65 verify |
232038 cycles |
231619 cycles |
1.00 |
ML-DSA-87 keypair |
336288 cycles |
336101 cycles |
1.00 |
ML-DSA-87 sign |
928838 cycles |
985664 cycles |
0.94 |
ML-DSA-87 verify |
370321 cycles |
370112 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)
Benchmark suite | Current: 52400e8 | Previous: 6a9e7f3 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
215213 cycles |
214823 cycles |
1.00 |
ML-DSA-44 sign |
796662 cycles |
797106 cycles |
1.00 |
ML-DSA-44 verify |
239545 cycles |
239791 cycles |
1.00 |
ML-DSA-65 keypair |
383960 cycles |
383443 cycles |
1.00 |
ML-DSA-65 sign |
1318169 cycles |
1321371 cycles |
1.00 |
ML-DSA-65 verify |
385194 cycles |
384658 cycles |
1.00 |
ML-DSA-87 keypair |
612209 cycles |
611086 cycles |
1.00 |
ML-DSA-87 sign |
1662843 cycles |
1663540 cycles |
1.00 |
ML-DSA-87 verify |
637814 cycles |
637462 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i)
Benchmark suite | Current: 52400e8 | Previous: 6a9e7f3 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
37269 cycles |
37391 cycles |
1.00 |
ML-DSA-44 sign |
156663 cycles |
169666 cycles |
0.92 |
ML-DSA-44 verify |
49964 cycles |
50078 cycles |
1.00 |
ML-DSA-65 keypair |
65965 cycles |
66540 cycles |
0.99 |
ML-DSA-65 sign |
259219 cycles |
279863 cycles |
0.93 |
ML-DSA-65 verify |
78658 cycles |
78832 cycles |
1.00 |
ML-DSA-87 keypair |
101091 cycles |
100953 cycles |
1.00 |
ML-DSA-87 sign |
309735 cycles |
326976 cycles |
0.95 |
ML-DSA-87 verify |
117706 cycles |
117129 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i) (no-opt)
Benchmark suite | Current: 52400e8 | Previous: 6a9e7f3 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
96373 cycles |
96101 cycles |
1.00 |
ML-DSA-44 sign |
352305 cycles |
351827 cycles |
1.00 |
ML-DSA-44 verify |
105613 cycles |
105202 cycles |
1.00 |
ML-DSA-65 keypair |
163402 cycles |
163467 cycles |
1.00 |
ML-DSA-65 sign |
579140 cycles |
579623 cycles |
1.00 |
ML-DSA-65 verify |
169098 cycles |
168943 cycles |
1.00 |
ML-DSA-87 keypair |
274423 cycles |
273027 cycles |
1.01 |
ML-DSA-87 sign |
733987 cycles |
733850 cycles |
1.00 |
ML-DSA-87 verify |
281558 cycles |
281472 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a)
Benchmark suite | Current: 52400e8 | Previous: 6a9e7f3 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
77861 cycles |
74153 cycles |
1.05 |
ML-DSA-44 sign |
257974 cycles |
269188 cycles |
0.96 |
ML-DSA-44 verify |
93787 cycles |
88813 cycles |
1.06 |
ML-DSA-65 keypair |
126201 cycles |
126040 cycles |
1.00 |
ML-DSA-65 sign |
398767 cycles |
432983 cycles |
0.92 |
ML-DSA-65 verify |
142169 cycles |
141947 cycles |
1.00 |
ML-DSA-87 keypair |
210677 cycles |
212859 cycles |
0.99 |
ML-DSA-87 sign |
506966 cycles |
548118 cycles |
0.92 |
ML-DSA-87 verify |
229048 cycles |
231598 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i)
Benchmark suite | Current: 52400e8 | Previous: 6a9e7f3 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
60996 cycles |
60934 cycles |
1.00 |
ML-DSA-44 sign |
240511 cycles |
264765 cycles |
0.91 |
ML-DSA-44 verify |
80753 cycles |
80601 cycles |
1.00 |
ML-DSA-65 keypair |
108711 cycles |
106939 cycles |
1.02 |
ML-DSA-65 sign |
404832 cycles |
437635 cycles |
0.93 |
ML-DSA-65 verify |
129067 cycles |
127465 cycles |
1.01 |
ML-DSA-87 keypair |
163491 cycles |
166545 cycles |
0.98 |
ML-DSA-87 sign |
470318 cycles |
515721 cycles |
0.91 |
ML-DSA-87 verify |
189408 cycles |
192786 cycles |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a)
Benchmark suite | Current: 52400e8 | Previous: 6a9e7f3 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
44285 cycles |
44398 cycles |
1.00 |
ML-DSA-44 sign |
177097 cycles |
197053 cycles |
0.90 |
ML-DSA-44 verify |
60289 cycles |
60543 cycles |
1.00 |
ML-DSA-65 keypair |
75868 cycles |
75922 cycles |
1.00 |
ML-DSA-65 sign |
282829 cycles |
319168 cycles |
0.89 |
ML-DSA-65 verify |
94066 cycles |
93010 cycles |
1.01 |
ML-DSA-87 keypair |
116110 cycles |
115966 cycles |
1.00 |
ML-DSA-87 sign |
335125 cycles |
369577 cycles |
0.91 |
ML-DSA-87 verify |
142087 cycles |
139480 cycles |
1.02 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'AMD EPYC 4th gen (c7a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03
.
Benchmark suite | Current: 0f865ed | Previous: 108d2d3 | Ratio |
---|---|---|---|
ML-DSA-87 keypair |
119449 cycles |
115550 cycles |
1.03 |
ML-DSA-87 verify |
141848 cycles |
136834 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a) (no-opt)
Benchmark suite | Current: 52400e8 | Previous: 6a9e7f3 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
136442 cycles |
136765 cycles |
1.00 |
ML-DSA-44 sign |
552709 cycles |
555020 cycles |
1.00 |
ML-DSA-44 verify |
154569 cycles |
154891 cycles |
1.00 |
ML-DSA-65 keypair |
227676 cycles |
227905 cycles |
1.00 |
ML-DSA-65 sign |
891454 cycles |
892504 cycles |
1.00 |
ML-DSA-65 verify |
243542 cycles |
244076 cycles |
1.00 |
ML-DSA-87 keypair |
376652 cycles |
376770 cycles |
1.00 |
ML-DSA-87 sign |
1115251 cycles |
1115378 cycles |
1.00 |
ML-DSA-87 verify |
398878 cycles |
398605 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)
Benchmark suite | Current: 52400e8 | Previous: 6a9e7f3 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
298656 cycles |
300219 cycles |
0.99 |
ML-DSA-44 sign |
1132000 cycles |
1236451 cycles |
0.92 |
ML-DSA-44 verify |
361791 cycles |
357732 cycles |
1.01 |
ML-DSA-65 keypair |
507102 cycles |
507848 cycles |
1.00 |
ML-DSA-65 sign |
1873378 cycles |
2011617 cycles |
0.93 |
ML-DSA-65 verify |
558065 cycles |
558683 cycles |
1.00 |
ML-DSA-87 keypair |
860140 cycles |
872050 cycles |
0.99 |
ML-DSA-87 sign |
2471030 cycles |
2624877 cycles |
0.94 |
ML-DSA-87 verify |
931913 cycles |
938240 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4
Benchmark suite | Current: 52400e8 | Previous: 6a9e7f3 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
73090 cycles |
72904 cycles |
1.00 |
ML-DSA-44 sign |
262009 cycles |
282731 cycles |
0.93 |
ML-DSA-44 verify |
87170 cycles |
87111 cycles |
1.00 |
ML-DSA-65 keypair |
129023 cycles |
128483 cycles |
1.00 |
ML-DSA-65 sign |
431495 cycles |
460937 cycles |
0.94 |
ML-DSA-65 verify |
139521 cycles |
139107 cycles |
1.00 |
ML-DSA-87 keypair |
208590 cycles |
207855 cycles |
1.00 |
ML-DSA-87 sign |
536383 cycles |
563388 cycles |
0.95 |
ML-DSA-87 verify |
223108 cycles |
222542 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i) (no-opt)
Benchmark suite | Current: 52400e8 | Previous: 6a9e7f3 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
159416 cycles |
159431 cycles |
1.00 |
ML-DSA-44 sign |
576923 cycles |
574205 cycles |
1.00 |
ML-DSA-44 verify |
175501 cycles |
175122 cycles |
1.00 |
ML-DSA-65 keypair |
271891 cycles |
272121 cycles |
1.00 |
ML-DSA-65 sign |
945450 cycles |
950970 cycles |
0.99 |
ML-DSA-65 verify |
283115 cycles |
283406 cycles |
1.00 |
ML-DSA-87 keypair |
454217 cycles |
453040 cycles |
1.00 |
ML-DSA-87 sign |
1196506 cycles |
1194595 cycles |
1.00 |
ML-DSA-87 verify |
471158 cycles |
469649 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a) (no-opt)
Benchmark suite | Current: 52400e8 | Previous: 6a9e7f3 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
123330 cycles |
121454 cycles |
1.02 |
ML-DSA-44 sign |
469492 cycles |
467420 cycles |
1.00 |
ML-DSA-44 verify |
139153 cycles |
136935 cycles |
1.02 |
ML-DSA-65 keypair |
207295 cycles |
206019 cycles |
1.01 |
ML-DSA-65 sign |
751087 cycles |
753924 cycles |
1.00 |
ML-DSA-65 verify |
216892 cycles |
217888 cycles |
1.00 |
ML-DSA-87 keypair |
341909 cycles |
341986 cycles |
1.00 |
ML-DSA-87 sign |
958782 cycles |
952984 cycles |
1.01 |
ML-DSA-87 verify |
358041 cycles |
357724 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4 (no-opt)
Benchmark suite | Current: 52400e8 | Previous: 6a9e7f3 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
134759 cycles |
134706 cycles |
1.00 |
ML-DSA-44 sign |
507755 cycles |
508655 cycles |
1.00 |
ML-DSA-44 verify |
149495 cycles |
149594 cycles |
1.00 |
ML-DSA-65 keypair |
228672 cycles |
228403 cycles |
1.00 |
ML-DSA-65 sign |
825573 cycles |
824225 cycles |
1.00 |
ML-DSA-65 verify |
237518 cycles |
237248 cycles |
1.00 |
ML-DSA-87 keypair |
377149 cycles |
377391 cycles |
1.00 |
ML-DSA-87 sign |
1031536 cycles |
1030599 cycles |
1.00 |
ML-DSA-87 verify |
391370 cycles |
390950 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3
Benchmark suite | Current: 52400e8 | Previous: 6a9e7f3 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
78014 cycles |
77990 cycles |
1.00 |
ML-DSA-44 sign |
282225 cycles |
304149 cycles |
0.93 |
ML-DSA-44 verify |
95386 cycles |
95539 cycles |
1.00 |
ML-DSA-65 keypair |
135088 cycles |
134961 cycles |
1.00 |
ML-DSA-65 sign |
464959 cycles |
496716 cycles |
0.94 |
ML-DSA-65 verify |
151277 cycles |
151179 cycles |
1.00 |
ML-DSA-87 keypair |
217758 cycles |
217516 cycles |
1.00 |
ML-DSA-87 sign |
574031 cycles |
606523 cycles |
0.95 |
ML-DSA-87 verify |
239653 cycles |
239624 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2
Benchmark suite | Current: 52400e8 | Previous: 6a9e7f3 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
120494 cycles |
120913 cycles |
1.00 |
ML-DSA-44 sign |
453588 cycles |
489433 cycles |
0.93 |
ML-DSA-44 verify |
145750 cycles |
146130 cycles |
1.00 |
ML-DSA-65 keypair |
207626 cycles |
207543 cycles |
1.00 |
ML-DSA-65 sign |
748654 cycles |
802552 cycles |
0.93 |
ML-DSA-65 verify |
232034 cycles |
231747 cycles |
1.00 |
ML-DSA-87 keypair |
336402 cycles |
336678 cycles |
1.00 |
ML-DSA-87 sign |
929978 cycles |
986239 cycles |
0.94 |
ML-DSA-87 verify |
370007 cycles |
370226 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3 (no-opt)
Benchmark suite | Current: 52400e8 | Previous: 6a9e7f3 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
139781 cycles |
139732 cycles |
1.00 |
ML-DSA-44 sign |
505653 cycles |
505480 cycles |
1.00 |
ML-DSA-44 verify |
154443 cycles |
154400 cycles |
1.00 |
ML-DSA-65 keypair |
245094 cycles |
245192 cycles |
1.00 |
ML-DSA-65 sign |
824884 cycles |
824410 cycles |
1.00 |
ML-DSA-65 verify |
248346 cycles |
248701 cycles |
1.00 |
ML-DSA-87 keypair |
397665 cycles |
397288 cycles |
1.00 |
ML-DSA-87 sign |
1044177 cycles |
1043636 cycles |
1.00 |
ML-DSA-87 verify |
411659 cycles |
411303 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)
Benchmark suite | Current: 52400e8 | Previous: 6a9e7f3 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
469026 cycles |
468881 cycles |
1.00 |
ML-DSA-44 sign |
2238512 cycles |
2248871 cycles |
1.00 |
ML-DSA-44 verify |
565236 cycles |
561836 cycles |
1.01 |
ML-DSA-65 keypair |
782408 cycles |
783629 cycles |
1.00 |
ML-DSA-65 sign |
3656126 cycles |
3666008 cycles |
1.00 |
ML-DSA-65 verify |
871868 cycles |
868783 cycles |
1.00 |
ML-DSA-87 keypair |
1263276 cycles |
1260624 cycles |
1.00 |
ML-DSA-87 sign |
4518842 cycles |
4518541 cycles |
1.00 |
ML-DSA-87 verify |
1396696 cycles |
1389881 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)
Benchmark suite | Current: 52400e8 | Previous: 6a9e7f3 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
828705 cycles |
828183 cycles |
1.00 |
ML-DSA-44 sign |
3365620 cycles |
3364011 cycles |
1.00 |
ML-DSA-44 verify |
932132 cycles |
931608 cycles |
1.00 |
ML-DSA-65 keypair |
1394312 cycles |
1394183 cycles |
1.00 |
ML-DSA-65 sign |
5485228 cycles |
5487573 cycles |
1.00 |
ML-DSA-65 verify |
1482004 cycles |
1481552 cycles |
1.00 |
ML-DSA-87 keypair |
2310465 cycles |
2311843 cycles |
1.00 |
ML-DSA-87 sign |
6903276 cycles |
6920584 cycles |
1.00 |
ML-DSA-87 verify |
2434780 cycles |
2439612 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2 (no-opt)
Benchmark suite | Current: 52400e8 | Previous: 6a9e7f3 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
215483 cycles |
215740 cycles |
1.00 |
ML-DSA-44 sign |
797478 cycles |
811168 cycles |
0.98 |
ML-DSA-44 verify |
239781 cycles |
240054 cycles |
1.00 |
ML-DSA-65 keypair |
384013 cycles |
383947 cycles |
1.00 |
ML-DSA-65 sign |
1314460 cycles |
1313699 cycles |
1.00 |
ML-DSA-65 verify |
385281 cycles |
385250 cycles |
1.00 |
ML-DSA-87 keypair |
612362 cycles |
611735 cycles |
1.00 |
ML-DSA-87 sign |
1664503 cycles |
1666380 cycles |
1.00 |
ML-DSA-87 verify |
637504 cycles |
637464 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)
Benchmark suite | Current: 52400e8 | Previous: 6a9e7f3 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
239349 cycles |
235072 cycles |
1.02 |
ML-DSA-44 sign |
804618 cycles |
805610 cycles |
1.00 |
ML-DSA-44 verify |
278308 cycles |
261977 cycles |
1.06 |
ML-DSA-65 keypair |
420853 cycles |
413165 cycles |
1.02 |
ML-DSA-65 sign |
1327698 cycles |
1302032 cycles |
1.02 |
ML-DSA-65 verify |
443842 cycles |
414961 cycles |
1.07 |
ML-DSA-87 keypair |
700951 cycles |
668773 cycles |
1.05 |
ML-DSA-87 sign |
1768601 cycles |
1711807 cycles |
1.03 |
ML-DSA-87 verify |
735622 cycles |
694958 cycles |
1.06 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)
Benchmark suite | Current: 52400e8 | Previous: 6a9e7f3 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
320793 cycles |
306364 cycles |
1.05 |
ML-DSA-44 sign |
1238813 cycles |
1229045 cycles |
1.01 |
ML-DSA-44 verify |
345021 cycles |
350761 cycles |
0.98 |
ML-DSA-65 keypair |
592326 cycles |
561430 cycles |
1.06 |
ML-DSA-65 sign |
2033937 cycles |
2011765 cycles |
1.01 |
ML-DSA-65 verify |
567523 cycles |
547519 cycles |
1.04 |
ML-DSA-87 keypair |
871908 cycles |
868065 cycles |
1.00 |
ML-DSA-87 sign |
2516653 cycles |
2537481 cycles |
0.99 |
ML-DSA-87 verify |
904549 cycles |
901328 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
a78441c
to
ae4922c
Compare
5a50f49
to
aa59090
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Graviton4'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03
.
Benchmark suite | Current: aa59090 | Previous: 108d2d3 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
77432 cycles |
73120 cycles |
1.06 |
ML-DSA-44 verify |
90923 cycles |
87104 cycles |
1.04 |
ML-DSA-65 keypair |
135321 cycles |
128241 cycles |
1.06 |
ML-DSA-65 verify |
146013 cycles |
138989 cycles |
1.05 |
ML-DSA-87 keypair |
221150 cycles |
207784 cycles |
1.06 |
ML-DSA-87 verify |
236143 cycles |
222452 cycles |
1.06 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Graviton4 (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03
.
Benchmark suite | Current: aa59090 | Previous: 108d2d3 | Ratio |
---|---|---|---|
ML-DSA-65 sign |
845214 cycles |
819792 cycles |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03
.
Benchmark suite | Current: 52400e8 | Previous: 6a9e7f3 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
320793 cycles |
306364 cycles |
1.05 |
ML-DSA-65 keypair |
592326 cycles |
561430 cycles |
1.06 |
ML-DSA-65 verify |
567523 cycles |
547519 cycles |
1.04 |
This comment was automatically generated by workflow using github-action-benchmark.
aa59090
to
1fba7c3
Compare
1fba7c3
to
cc5b15c
Compare
cc5b15c
to
7e2191f
Compare
7e2191f
to
795cc6e
Compare
mldsa/poly.c
Outdated
/* TODO: proof */ | ||
mld_assert_bound(a->coeffs, MLDSA_N, 0, MLDSA_Q); | ||
mld_poly_decompose_32_native(a1->coeffs, a0->coeffs, a->coeffs); | ||
#else /* !None && MLD_USE_NATIVE_POLY_DECOMPOSE_32 && (MLDSA_MODE == 3 || \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something is going wrong with autogen
here, the first conditional does not seem to be recognized
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Can you take another look please?
This adds the AVX2 intrinsics implementation of poly_decompose from https://github.com/pq-crystals/dilithium/blob/master/avx2/rounding.c. Resolves #399. Signed-off-by: Matthias J. Kannwischer <[email protected]>
This add a native implementation of poly_decompose written from scratch. Resolves #397 Signed-off-by: Matthias J. Kannwischer <[email protected]>
795cc6e
to
52400e8
Compare
|
||
// Step 2: Barrett reduction with rounding: round(temp * 1025 / 2^22) | ||
// This computes: round(ceil(a/128) / 4092) | ||
// Combined: a1 ≈ round(ceil(a/128) / 4092) ≈ floor(a / 523776) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does the second equality hold?
// which is equivalent to: (temp * 1025 + 2^21) >> 22. | ||
sqrdmulh \a1\().4s, \a1\().4s, barrett_const.4s | ||
|
||
// Step 3: Mask to valid range [0, 14] since (Q-1)/(2*GAMMA2) = 15 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment would suggest that 15 is excluded after the masking, but it isn't, seeing that mask_15
is elementwise0x0F
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the comment is wrong. the valid range is [0, 15]. will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'AMD EPYC 3rd gen (c6a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03
.
Benchmark suite | Current: 52400e8 | Previous: 6a9e7f3 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
77861 cycles |
74153 cycles |
1.05 |
ML-DSA-44 verify |
93787 cycles |
88813 cycles |
1.06 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03
.
Benchmark suite | Current: 52400e8 | Previous: 6a9e7f3 | Ratio |
---|---|---|---|
ML-DSA-44 verify |
278308 cycles |
261977 cycles |
1.06 |
ML-DSA-65 verify |
443842 cycles |
414961 cycles |
1.07 |
ML-DSA-87 keypair |
700951 cycles |
668773 cycles |
1.05 |
ML-DSA-87 sign |
1768601 cycles |
1711807 cycles |
1.03 |
ML-DSA-87 verify |
735622 cycles |
694958 cycles |
1.06 |
This comment was automatically generated by workflow using github-action-benchmark.
poly_decompose
assembly #399poly_decompose
assembly #397