-
Notifications
You must be signed in to change notification settings - Fork 19
Implement poly_use_hint on top of poly_decompose #417
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This adds the AVX2 intrinsics implementation of poly_decompose from https://github.com/pq-crystals/dilithium/blob/master/avx2/rounding.c. Resolves #399. Signed-off-by: Matthias J. Kannwischer <[email protected]>
This add a native implementation of poly_decompose written from scratch. Resolves #397 Signed-off-by: Matthias J. Kannwischer <[email protected]>
This eliminates the helper function use_hint and instead inlines it into poly_use_hint to be able to re-use the native poly_decompose code. The disadvantage is that it requires an additional polynomial on the stack. Resolves #400 Resolvse #398 Signed-off-by: Matthias J. Kannwischer <[email protected]>
b330170
to
30d236c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks (opt)
Benchmark suite | Current: 30d236c | Previous: 1736882 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
50500 cycles |
50491 cycles |
1.00 |
ML-DSA-44 sign |
205382 cycles |
222970 cycles |
0.92 |
ML-DSA-44 verify |
71838 cycles |
72843 cycles |
0.99 |
ML-DSA-65 keypair |
87381 cycles |
87390 cycles |
1.00 |
ML-DSA-65 sign |
330994 cycles |
356076 cycles |
0.93 |
ML-DSA-65 verify |
111562 cycles |
112665 cycles |
0.99 |
ML-DSA-87 keypair |
140123 cycles |
140111 cycles |
1.00 |
ML-DSA-87 sign |
400435 cycles |
425586 cycles |
0.94 |
ML-DSA-87 verify |
171789 cycles |
173177 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mac Mini (M1, 2020) benchmarks (no-opt)
Benchmark suite | Current: 30d236c | Previous: 1736882 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
115992 cycles |
116006 cycles |
1.00 |
ML-DSA-44 sign |
455097 cycles |
455029 cycles |
1.00 |
ML-DSA-44 verify |
139943 cycles |
136872 cycles |
1.02 |
ML-DSA-65 keypair |
197959 cycles |
197989 cycles |
1.00 |
ML-DSA-65 sign |
733264 cycles |
733207 cycles |
1.00 |
ML-DSA-65 verify |
220850 cycles |
216803 cycles |
1.02 |
ML-DSA-87 keypair |
335256 cycles |
335091 cycles |
1.00 |
ML-DSA-87 sign |
915276 cycles |
915023 cycles |
1.00 |
ML-DSA-87 verify |
358855 cycles |
353226 cycles |
1.02 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i)
Benchmark suite | Current: 30d236c | Previous: 1736882 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
37849 cycles |
37381 cycles |
1.01 |
ML-DSA-44 sign |
156256 cycles |
169592 cycles |
0.92 |
ML-DSA-44 verify |
47881 cycles |
50107 cycles |
0.96 |
ML-DSA-65 keypair |
66644 cycles |
65875 cycles |
1.01 |
ML-DSA-65 sign |
262073 cycles |
279226 cycles |
0.94 |
ML-DSA-65 verify |
75847 cycles |
78656 cycles |
0.96 |
ML-DSA-87 keypair |
99769 cycles |
100966 cycles |
0.99 |
ML-DSA-87 sign |
302956 cycles |
326836 cycles |
0.93 |
ML-DSA-87 verify |
111504 cycles |
117910 cycles |
0.95 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a)
Benchmark suite | Current: 30d236c | Previous: 1736882 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
44319 cycles |
44805 cycles |
0.99 |
ML-DSA-44 sign |
176844 cycles |
197282 cycles |
0.90 |
ML-DSA-44 verify |
57118 cycles |
60605 cycles |
0.94 |
ML-DSA-65 keypair |
75944 cycles |
85544 cycles |
0.89 |
ML-DSA-65 sign |
282633 cycles |
329012 cycles |
0.86 |
ML-DSA-65 verify |
88237 cycles |
101933 cycles |
0.87 |
ML-DSA-87 keypair |
116142 cycles |
115872 cycles |
1.00 |
ML-DSA-87 sign |
335251 cycles |
369088 cycles |
0.91 |
ML-DSA-87 verify |
130685 cycles |
138380 cycles |
0.94 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 4th gen (c7i) (no-opt)
Benchmark suite | Current: 30d236c | Previous: 1736882 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
96173 cycles |
96162 cycles |
1.00 |
ML-DSA-44 sign |
351427 cycles |
351587 cycles |
1.00 |
ML-DSA-44 verify |
106204 cycles |
105379 cycles |
1.01 |
ML-DSA-65 keypair |
163642 cycles |
164036 cycles |
1.00 |
ML-DSA-65 sign |
580111 cycles |
583065 cycles |
0.99 |
ML-DSA-65 verify |
171360 cycles |
169132 cycles |
1.01 |
ML-DSA-87 keypair |
273379 cycles |
273210 cycles |
1.00 |
ML-DSA-87 sign |
733534 cycles |
733446 cycles |
1.00 |
ML-DSA-87 verify |
281281 cycles |
280884 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i)
Benchmark suite | Current: 30d236c | Previous: 1736882 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
61006 cycles |
61211 cycles |
1.00 |
ML-DSA-44 sign |
241339 cycles |
264347 cycles |
0.91 |
ML-DSA-44 verify |
76803 cycles |
80506 cycles |
0.95 |
ML-DSA-65 keypair |
107425 cycles |
107017 cycles |
1.00 |
ML-DSA-65 sign |
400256 cycles |
436791 cycles |
0.92 |
ML-DSA-65 verify |
122508 cycles |
127415 cycles |
0.96 |
ML-DSA-87 keypair |
166031 cycles |
165409 cycles |
1.00 |
ML-DSA-87 sign |
479942 cycles |
516438 cycles |
0.93 |
ML-DSA-87 verify |
184026 cycles |
190577 cycles |
0.97 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a)
Benchmark suite | Current: 30d236c | Previous: 1736882 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
77797 cycles |
74255 cycles |
1.05 |
ML-DSA-44 sign |
257720 cycles |
269307 cycles |
0.96 |
ML-DSA-44 verify |
89884 cycles |
89061 cycles |
1.01 |
ML-DSA-65 keypair |
126917 cycles |
126527 cycles |
1.00 |
ML-DSA-65 sign |
400004 cycles |
433751 cycles |
0.92 |
ML-DSA-65 verify |
137095 cycles |
142344 cycles |
0.96 |
ML-DSA-87 keypair |
210105 cycles |
210098 cycles |
1.00 |
ML-DSA-87 sign |
506180 cycles |
543857 cycles |
0.93 |
ML-DSA-87 verify |
223266 cycles |
230154 cycles |
0.97 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SpacemiT K1 8 (Banana Pi F3) benchmarks (no-opt)
Benchmark suite | Current: 30d236c | Previous: 1736882 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
826958 cycles |
830867 cycles |
1.00 |
ML-DSA-44 sign |
3365565 cycles |
3361137 cycles |
1.00 |
ML-DSA-44 verify |
934455 cycles |
930654 cycles |
1.00 |
ML-DSA-65 keypair |
1393322 cycles |
1394406 cycles |
1.00 |
ML-DSA-65 sign |
5487332 cycles |
5490767 cycles |
1.00 |
ML-DSA-65 verify |
1484934 cycles |
1482041 cycles |
1.00 |
ML-DSA-87 keypair |
2305366 cycles |
2310541 cycles |
1.00 |
ML-DSA-87 sign |
6883188 cycles |
6926078 cycles |
0.99 |
ML-DSA-87 verify |
2430628 cycles |
2437273 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'AMD EPYC 3rd gen (c6a)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03
.
Benchmark suite | Current: 30d236c | Previous: 1736882 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
77797 cycles |
74255 cycles |
1.05 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4
Benchmark suite | Current: 30d236c | Previous: 1736882 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
73057 cycles |
72955 cycles |
1.00 |
ML-DSA-44 sign |
262119 cycles |
282599 cycles |
0.93 |
ML-DSA-44 verify |
85637 cycles |
87098 cycles |
0.98 |
ML-DSA-65 keypair |
128897 cycles |
128338 cycles |
1.00 |
ML-DSA-65 sign |
431334 cycles |
461044 cycles |
0.94 |
ML-DSA-65 verify |
138078 cycles |
139034 cycles |
0.99 |
ML-DSA-87 keypair |
208233 cycles |
208000 cycles |
1.00 |
ML-DSA-87 sign |
536103 cycles |
563267 cycles |
0.95 |
ML-DSA-87 verify |
220559 cycles |
222676 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 4th gen (c7a) (no-opt)
Benchmark suite | Current: 30d236c | Previous: 1736882 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
121614 cycles |
121992 cycles |
1.00 |
ML-DSA-44 sign |
466651 cycles |
467100 cycles |
1.00 |
ML-DSA-44 verify |
138107 cycles |
137172 cycles |
1.01 |
ML-DSA-65 keypair |
205647 cycles |
205879 cycles |
1.00 |
ML-DSA-65 sign |
752379 cycles |
752229 cycles |
1.00 |
ML-DSA-65 verify |
218566 cycles |
216901 cycles |
1.01 |
ML-DSA-87 keypair |
340764 cycles |
341355 cycles |
1.00 |
ML-DSA-87 sign |
955524 cycles |
953098 cycles |
1.00 |
ML-DSA-87 verify |
360409 cycles |
357806 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intel Xeon 3rd gen (c6i) (no-opt)
Benchmark suite | Current: 30d236c | Previous: 1736882 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
159081 cycles |
159435 cycles |
1.00 |
ML-DSA-44 sign |
574012 cycles |
574575 cycles |
1.00 |
ML-DSA-44 verify |
176683 cycles |
175326 cycles |
1.01 |
ML-DSA-65 keypair |
271339 cycles |
272743 cycles |
0.99 |
ML-DSA-65 sign |
950325 cycles |
948311 cycles |
1.00 |
ML-DSA-65 verify |
286775 cycles |
284273 cycles |
1.01 |
ML-DSA-87 keypair |
452772 cycles |
452203 cycles |
1.00 |
ML-DSA-87 sign |
1193487 cycles |
1194217 cycles |
1.00 |
ML-DSA-87 verify |
471296 cycles |
469663 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3
Benchmark suite | Current: 30d236c | Previous: 1736882 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
78057 cycles |
78015 cycles |
1.00 |
ML-DSA-44 sign |
282099 cycles |
304192 cycles |
0.93 |
ML-DSA-44 verify |
92841 cycles |
95531 cycles |
0.97 |
ML-DSA-65 keypair |
135025 cycles |
134970 cycles |
1.00 |
ML-DSA-65 sign |
464808 cycles |
496694 cycles |
0.94 |
ML-DSA-65 verify |
149724 cycles |
151189 cycles |
0.99 |
ML-DSA-87 keypair |
217783 cycles |
217547 cycles |
1.00 |
ML-DSA-87 sign |
573887 cycles |
606490 cycles |
0.95 |
ML-DSA-87 verify |
237677 cycles |
239637 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AMD EPYC 3rd gen (c6a) (no-opt)
Benchmark suite | Current: 30d236c | Previous: 1736882 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
136457 cycles |
136784 cycles |
1.00 |
ML-DSA-44 sign |
554395 cycles |
554088 cycles |
1.00 |
ML-DSA-44 verify |
156645 cycles |
155245 cycles |
1.01 |
ML-DSA-65 keypair |
228109 cycles |
227723 cycles |
1.00 |
ML-DSA-65 sign |
893154 cycles |
894394 cycles |
1.00 |
ML-DSA-65 verify |
245855 cycles |
244248 cycles |
1.01 |
ML-DSA-87 keypair |
375905 cycles |
377140 cycles |
1.00 |
ML-DSA-87 sign |
1112907 cycles |
1115587 cycles |
1.00 |
ML-DSA-87 verify |
400199 cycles |
398881 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2
Benchmark suite | Current: 30d236c | Previous: 1736882 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
120487 cycles |
120805 cycles |
1.00 |
ML-DSA-44 sign |
453632 cycles |
489107 cycles |
0.93 |
ML-DSA-44 verify |
142359 cycles |
146029 cycles |
0.97 |
ML-DSA-65 keypair |
207537 cycles |
207549 cycles |
1.00 |
ML-DSA-65 sign |
749222 cycles |
802620 cycles |
0.93 |
ML-DSA-65 verify |
229852 cycles |
231722 cycles |
0.99 |
ML-DSA-87 keypair |
336755 cycles |
336647 cycles |
1.00 |
ML-DSA-87 sign |
930353 cycles |
986391 cycles |
0.94 |
ML-DSA-87 verify |
365481 cycles |
370216 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton4 (no-opt)
Benchmark suite | Current: 30d236c | Previous: 1736882 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
134640 cycles |
134737 cycles |
1.00 |
ML-DSA-44 sign |
520155 cycles |
508680 cycles |
1.02 |
ML-DSA-44 verify |
152799 cycles |
149623 cycles |
1.02 |
ML-DSA-65 keypair |
228468 cycles |
228452 cycles |
1.00 |
ML-DSA-65 sign |
825053 cycles |
824132 cycles |
1.00 |
ML-DSA-65 verify |
240390 cycles |
237291 cycles |
1.01 |
ML-DSA-87 keypair |
376971 cycles |
377483 cycles |
1.00 |
ML-DSA-87 sign |
1029692 cycles |
1030717 cycles |
1.00 |
ML-DSA-87 verify |
395171 cycles |
391020 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton3 (no-opt)
Benchmark suite | Current: 30d236c | Previous: 1736882 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
139780 cycles |
139749 cycles |
1.00 |
ML-DSA-44 sign |
512932 cycles |
505375 cycles |
1.01 |
ML-DSA-44 verify |
156718 cycles |
154405 cycles |
1.01 |
ML-DSA-65 keypair |
245178 cycles |
244944 cycles |
1.00 |
ML-DSA-65 sign |
824538 cycles |
824048 cycles |
1.00 |
ML-DSA-65 verify |
252977 cycles |
248646 cycles |
1.02 |
ML-DSA-87 keypair |
397778 cycles |
397303 cycles |
1.00 |
ML-DSA-87 sign |
1044062 cycles |
1043765 cycles |
1.00 |
ML-DSA-87 verify |
417220 cycles |
411283 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)
Benchmark suite | Current: 30d236c | Previous: 1736882 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
120417 cycles |
120271 cycles |
1.00 |
ML-DSA-44 sign |
453396 cycles |
488352 cycles |
0.93 |
ML-DSA-44 verify |
142184 cycles |
145585 cycles |
0.98 |
ML-DSA-65 keypair |
207481 cycles |
207309 cycles |
1.00 |
ML-DSA-65 sign |
748396 cycles |
802690 cycles |
0.93 |
ML-DSA-65 verify |
229905 cycles |
231615 cycles |
0.99 |
ML-DSA-87 keypair |
336582 cycles |
336114 cycles |
1.00 |
ML-DSA-87 sign |
929423 cycles |
985677 cycles |
0.94 |
ML-DSA-87 verify |
365760 cycles |
370072 cycles |
0.99 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2 (no-opt)
Benchmark suite | Current: 30d236c | Previous: 1736882 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
215251 cycles |
215510 cycles |
1.00 |
ML-DSA-44 sign |
799024 cycles |
809958 cycles |
0.99 |
ML-DSA-44 verify |
244130 cycles |
239793 cycles |
1.02 |
ML-DSA-65 keypair |
383492 cycles |
383922 cycles |
1.00 |
ML-DSA-65 sign |
1310912 cycles |
1313723 cycles |
1.00 |
ML-DSA-65 verify |
391832 cycles |
385228 cycles |
1.02 |
ML-DSA-87 keypair |
611660 cycles |
611830 cycles |
1.00 |
ML-DSA-87 sign |
1665709 cycles |
1666283 cycles |
1.00 |
ML-DSA-87 verify |
648641 cycles |
637481 cycles |
1.02 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)
Benchmark suite | Current: 30d236c | Previous: 1736882 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
214965 cycles |
214904 cycles |
1.00 |
ML-DSA-44 sign |
797783 cycles |
797156 cycles |
1.00 |
ML-DSA-44 verify |
243977 cycles |
239869 cycles |
1.02 |
ML-DSA-65 keypair |
383157 cycles |
383600 cycles |
1.00 |
ML-DSA-65 sign |
1318623 cycles |
1321544 cycles |
1.00 |
ML-DSA-65 verify |
391639 cycles |
384834 cycles |
1.02 |
ML-DSA-87 keypair |
610983 cycles |
611160 cycles |
1.00 |
ML-DSA-87 sign |
1663256 cycles |
1663574 cycles |
1.00 |
ML-DSA-87 verify |
648213 cycles |
637385 cycles |
1.02 |
This comment was automatically generated by workflow using github-action-benchmark.
Hmmmm, seems to work okay on some Intel platforms, but not great on AArch64. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks (opt)
Benchmark suite | Current: 30d236c | Previous: 1736882 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
295501 cycles |
299821 cycles |
0.99 |
ML-DSA-44 sign |
1136287 cycles |
1229073 cycles |
0.92 |
ML-DSA-44 verify |
342899 cycles |
361928 cycles |
0.95 |
ML-DSA-65 keypair |
502192 cycles |
503702 cycles |
1.00 |
ML-DSA-65 sign |
1858520 cycles |
1996981 cycles |
0.93 |
ML-DSA-65 verify |
540919 cycles |
559218 cycles |
0.97 |
ML-DSA-87 keypair |
853068 cycles |
850138 cycles |
1.00 |
ML-DSA-87 sign |
2426368 cycles |
2557081 cycles |
0.95 |
ML-DSA-87 verify |
905590 cycles |
924273 cycles |
0.98 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)
Benchmark suite | Current: 30d236c | Previous: 1736882 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
470070 cycles |
469100 cycles |
1.00 |
ML-DSA-44 sign |
2254019 cycles |
2237875 cycles |
1.01 |
ML-DSA-44 verify |
571416 cycles |
564986 cycles |
1.01 |
ML-DSA-65 keypair |
787327 cycles |
789950 cycles |
1.00 |
ML-DSA-65 sign |
3674530 cycles |
3700139 cycles |
0.99 |
ML-DSA-65 verify |
878677 cycles |
872036 cycles |
1.01 |
ML-DSA-87 keypair |
1262444 cycles |
1268690 cycles |
1.00 |
ML-DSA-87 sign |
4489950 cycles |
4543181 cycles |
0.99 |
ML-DSA-87 verify |
1403237 cycles |
1397372 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)
Benchmark suite | Current: 30d236c | Previous: 1736882 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
229696 cycles |
236241 cycles |
0.97 |
ML-DSA-44 sign |
777476 cycles |
809081 cycles |
0.96 |
ML-DSA-44 verify |
255039 cycles |
267657 cycles |
0.95 |
ML-DSA-65 keypair |
417139 cycles |
404121 cycles |
1.03 |
ML-DSA-65 sign |
1309627 cycles |
1315060 cycles |
1.00 |
ML-DSA-65 verify |
436970 cycles |
416690 cycles |
1.05 |
ML-DSA-87 keypair |
670883 cycles |
659871 cycles |
1.02 |
ML-DSA-87 sign |
1598694 cycles |
1679421 cycles |
0.95 |
ML-DSA-87 verify |
687899 cycles |
677911 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03
.
Benchmark suite | Current: 30d236c | Previous: 1736882 | Ratio |
---|---|---|---|
ML-DSA-65 keypair |
417139 cycles |
404121 cycles |
1.03 |
ML-DSA-65 verify |
436970 cycles |
416690 cycles |
1.05 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)
Benchmark suite | Current: 30d236c | Previous: 1736882 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
314316 cycles |
320637 cycles |
0.98 |
ML-DSA-44 sign |
1232665 cycles |
1259345 cycles |
0.98 |
ML-DSA-44 verify |
363930 cycles |
352392 cycles |
1.03 |
ML-DSA-65 keypair |
577868 cycles |
586954 cycles |
0.98 |
ML-DSA-65 sign |
1982825 cycles |
2044076 cycles |
0.97 |
ML-DSA-65 verify |
563056 cycles |
569356 cycles |
0.99 |
ML-DSA-87 keypair |
911438 cycles |
889842 cycles |
1.02 |
ML-DSA-87 sign |
2621391 cycles |
2589004 cycles |
1.01 |
ML-DSA-87 verify |
953245 cycles |
923943 cycles |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03
.
Benchmark suite | Current: 30d236c | Previous: 1736882 | Ratio |
---|---|---|---|
ML-DSA-44 verify |
363930 cycles |
352392 cycles |
1.03 |
ML-DSA-87 verify |
953245 cycles |
923943 cycles |
1.03 |
This comment was automatically generated by workflow using github-action-benchmark.
poly_decompose
#411poly_use_hint
assembly #400poly_use_hint
assembly #398This eliminates the helper function use_hint and instead inlines it
into poly_use_hint to be able to re-use the native poly_decompose code.
The disadvantage is that it requires an additional polynomial on the
stack.