-
Notifications
You must be signed in to change notification settings - Fork 22
AVX2: Add native implementation of poly_reduce
#333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)
Benchmark suite | Current: d40c2ae | Previous: d526633 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
475221 cycles |
473829 cycles |
1.00 |
ML-DSA-44 sign |
2137395 cycles |
2129055 cycles |
1.00 |
ML-DSA-44 verify |
550827 cycles |
549719 cycles |
1.00 |
ML-DSA-65 keypair |
798321 cycles |
796338 cycles |
1.00 |
ML-DSA-65 sign |
3497650 cycles |
3500697 cycles |
1.00 |
ML-DSA-65 verify |
857239 cycles |
850367 cycles |
1.01 |
ML-DSA-87 keypair |
1284817 cycles |
1287684 cycles |
1.00 |
ML-DSA-87 sign |
4352515 cycles |
4361993 cycles |
1.00 |
ML-DSA-87 verify |
1363181 cycles |
1362495 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)
Benchmark suite | Current: d40c2ae | Previous: d526633 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
133970 cycles |
133863 cycles |
1.00 |
ML-DSA-44 sign |
450491 cycles |
450656 cycles |
1.00 |
ML-DSA-44 verify |
148661 cycles |
148702 cycles |
1.00 |
ML-DSA-65 keypair |
248849 cycles |
249150 cycles |
1.00 |
ML-DSA-65 sign |
744981 cycles |
745171 cycles |
1.00 |
ML-DSA-65 verify |
241090 cycles |
241228 cycles |
1.00 |
ML-DSA-87 keypair |
377298 cycles |
377711 cycles |
1.00 |
ML-DSA-87 sign |
938559 cycles |
938337 cycles |
1.00 |
ML-DSA-87 verify |
391364 cycles |
391344 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2 (no-opt)
Benchmark suite | Current: d40c2ae | Previous: d526633 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
216360 cycles |
216245 cycles |
1.00 |
ML-DSA-44 sign |
756876 cycles |
757775 cycles |
1.00 |
ML-DSA-44 verify |
235917 cycles |
236117 cycles |
1.00 |
ML-DSA-65 keypair |
383159 cycles |
382484 cycles |
1.00 |
ML-DSA-65 sign |
1225077 cycles |
1225563 cycles |
1.00 |
ML-DSA-65 verify |
378659 cycles |
378464 cycles |
1.00 |
ML-DSA-87 keypair |
627656 cycles |
637392 cycles |
0.98 |
ML-DSA-87 sign |
1580665 cycles |
1580228 cycles |
1.00 |
ML-DSA-87 verify |
629959 cycles |
628801 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)
Benchmark suite | Current: d40c2ae | Previous: d526633 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
215675 cycles |
215699 cycles |
1.00 |
ML-DSA-44 sign |
753450 cycles |
753215 cycles |
1.00 |
ML-DSA-44 verify |
235296 cycles |
235275 cycles |
1.00 |
ML-DSA-65 keypair |
382578 cycles |
382118 cycles |
1.00 |
ML-DSA-65 sign |
1224828 cycles |
1225083 cycles |
1.00 |
ML-DSA-65 verify |
378228 cycles |
378070 cycles |
1.00 |
ML-DSA-87 keypair |
613771 cycles |
613770 cycles |
1.00 |
ML-DSA-87 sign |
1581655 cycles |
1621229 cycles |
0.98 |
ML-DSA-87 verify |
629308 cycles |
628641 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)
Benchmark suite | Current: d40c2ae | Previous: d526633 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
240874 cycles |
242260 cycles |
0.99 |
ML-DSA-44 sign |
767307 cycles |
771195 cycles |
0.99 |
ML-DSA-44 verify |
259098 cycles |
257032 cycles |
1.01 |
ML-DSA-65 keypair |
466074 cycles |
464781 cycles |
1.00 |
ML-DSA-65 sign |
1245666 cycles |
1274872 cycles |
0.98 |
ML-DSA-65 verify |
426751 cycles |
427241 cycles |
1.00 |
ML-DSA-87 keypair |
695766 cycles |
696748 cycles |
1.00 |
ML-DSA-87 sign |
1659413 cycles |
1648764 cycles |
1.01 |
ML-DSA-87 verify |
700218 cycles |
692610 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)
Benchmark suite | Current: d40c2ae | Previous: d526633 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
313665 cycles |
319973 cycles |
0.98 |
ML-DSA-44 sign |
1182564 cycles |
1201861 cycles |
0.98 |
ML-DSA-44 verify |
337721 cycles |
349183 cycles |
0.97 |
ML-DSA-65 keypair |
553895 cycles |
588738 cycles |
0.94 |
ML-DSA-65 sign |
1875692 cycles |
1972685 cycles |
0.95 |
ML-DSA-65 verify |
543355 cycles |
574047 cycles |
0.95 |
ML-DSA-87 keypair |
860198 cycles |
893436 cycles |
0.96 |
ML-DSA-87 sign |
2377351 cycles |
2511766 cycles |
0.95 |
ML-DSA-87 verify |
891297 cycles |
918147 cycles |
0.97 |
This comment was automatically generated by workflow using github-action-benchmark.
Signed-off-by: Jake Massimo <[email protected]>
Signed-off-by: Jake Massimo <[email protected]>
d40c2ae
to
a686318
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (opt)
Benchmark suite | Current: a686318 | Previous: 2c8f312 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
128470 cycles |
128662 cycles |
1.00 |
ML-DSA-44 sign |
419395 cycles |
419335 cycles |
1.00 |
ML-DSA-44 verify |
142277 cycles |
142486 cycles |
1.00 |
ML-DSA-65 keypair |
239931 cycles |
240198 cycles |
1.00 |
ML-DSA-65 sign |
694661 cycles |
694792 cycles |
1.00 |
ML-DSA-65 verify |
231020 cycles |
231005 cycles |
1.00 |
ML-DSA-87 keypair |
361925 cycles |
361975 cycles |
1.00 |
ML-DSA-87 sign |
878939 cycles |
879257 cycles |
1.00 |
ML-DSA-87 verify |
374809 cycles |
375200 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A76 (Raspberry Pi 5) benchmarks (no-opt)
Benchmark suite | Current: a686318 | Previous: 2c8f312 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
209942 cycles |
209811 cycles |
1.00 |
ML-DSA-44 sign |
721618 cycles |
721462 cycles |
1.00 |
ML-DSA-44 verify |
228707 cycles |
228512 cycles |
1.00 |
ML-DSA-65 keypair |
376004 cycles |
375810 cycles |
1.00 |
ML-DSA-65 sign |
1185956 cycles |
1185979 cycles |
1.00 |
ML-DSA-65 verify |
370458 cycles |
370610 cycles |
1.00 |
ML-DSA-87 keypair |
596185 cycles |
595808 cycles |
1.00 |
ML-DSA-87 sign |
1514147 cycles |
1514568 cycles |
1.00 |
ML-DSA-87 verify |
614250 cycles |
613735 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'AMD EPYC 4th gen (c7a) (no-opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03
.
Benchmark suite | Current: a686318 | Previous: 2c8f312 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
134350 cycles |
119172 cycles |
1.13 |
ML-DSA-44 sign |
473745 cycles |
418564 cycles |
1.13 |
ML-DSA-44 verify |
148381 cycles |
131691 cycles |
1.13 |
ML-DSA-65 keypair |
225136 cycles |
200755 cycles |
1.12 |
ML-DSA-65 sign |
753226 cycles |
676135 cycles |
1.11 |
ML-DSA-65 verify |
229951 cycles |
206045 cycles |
1.12 |
ML-DSA-87 keypair |
374536 cycles |
335161 cycles |
1.12 |
ML-DSA-87 sign |
974253 cycles |
869596 cycles |
1.12 |
ML-DSA-87 verify |
383457 cycles |
341973 cycles |
1.12 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A55 (Snapdragon 888) benchmarks (no-opt)
Benchmark suite | Current: a686318 | Previous: 2c8f312 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
452496 cycles |
451876 cycles |
1.00 |
ML-DSA-44 sign |
2016313 cycles |
2006209 cycles |
1.01 |
ML-DSA-44 verify |
527259 cycles |
525659 cycles |
1.00 |
ML-DSA-65 keypair |
761517 cycles |
759548 cycles |
1.00 |
ML-DSA-65 sign |
3336200 cycles |
3318976 cycles |
1.01 |
ML-DSA-65 verify |
818823 cycles |
817753 cycles |
1.00 |
ML-DSA-87 keypair |
1233521 cycles |
1223157 cycles |
1.01 |
ML-DSA-87 sign |
4156316 cycles |
4111725 cycles |
1.01 |
ML-DSA-87 verify |
1320488 cycles |
1312412 cycles |
1.01 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Graviton2 (no-opt)
Benchmark suite | Current: a686318 | Previous: 2c8f312 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
210257 cycles |
210419 cycles |
1.00 |
ML-DSA-44 sign |
722131 cycles |
722578 cycles |
1.00 |
ML-DSA-44 verify |
235220 cycles |
238793 cycles |
0.99 |
ML-DSA-65 keypair |
377358 cycles |
377835 cycles |
1.00 |
ML-DSA-65 sign |
1187582 cycles |
1187372 cycles |
1.00 |
ML-DSA-65 verify |
370964 cycles |
371286 cycles |
1.00 |
ML-DSA-87 keypair |
596674 cycles |
595539 cycles |
1.00 |
ML-DSA-87 sign |
1517412 cycles |
1517135 cycles |
1.00 |
ML-DSA-87 verify |
614578 cycles |
613408 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)
Benchmark suite | Current: a686318 | Previous: 2c8f312 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
241685 cycles |
237197 cycles |
1.02 |
ML-DSA-44 sign |
708877 cycles |
703842 cycles |
1.01 |
ML-DSA-44 verify |
253673 cycles |
240763 cycles |
1.05 |
ML-DSA-65 keypair |
447265 cycles |
463657 cycles |
0.96 |
ML-DSA-65 sign |
1153429 cycles |
1186377 cycles |
0.97 |
ML-DSA-65 verify |
406068 cycles |
418257 cycles |
0.97 |
ML-DSA-87 keypair |
674462 cycles |
696260 cycles |
0.97 |
ML-DSA-87 sign |
1545844 cycles |
1539476 cycles |
1.00 |
ML-DSA-87 verify |
687922 cycles |
690657 cycles |
1.00 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⚠️ Performance Alert ⚠️
Possible performance regression was detected for benchmark 'Arm Cortex-A72 (Raspberry Pi 4) benchmarks (opt)'.
Benchmark result of this commit is worse than the previous benchmark result exceeding threshold 1.03
.
Benchmark suite | Current: a686318 | Previous: 2c8f312 | Ratio |
---|---|---|---|
ML-DSA-44 verify |
253673 cycles |
240763 cycles |
1.05 |
This comment was automatically generated by workflow using github-action-benchmark.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Arm Cortex-A72 (Raspberry Pi 4) benchmarks (no-opt)
Benchmark suite | Current: a686318 | Previous: 2c8f312 | Ratio |
---|---|---|---|
ML-DSA-44 keypair |
292565 cycles |
310373 cycles |
0.94 |
ML-DSA-44 sign |
1090927 cycles |
1132919 cycles |
0.96 |
ML-DSA-44 verify |
319009 cycles |
332037 cycles |
0.96 |
ML-DSA-65 keypair |
545770 cycles |
559733 cycles |
0.98 |
ML-DSA-65 sign |
1821019 cycles |
1850373 cycles |
0.98 |
ML-DSA-65 verify |
525103 cycles |
536497 cycles |
0.98 |
ML-DSA-87 keypair |
838288 cycles |
842080 cycles |
1.00 |
ML-DSA-87 sign |
2279646 cycles |
2348501 cycles |
0.97 |
ML-DSA-87 verify |
853547 cycles |
880866 cycles |
0.97 |
This comment was automatically generated by workflow using github-action-benchmark.
This extends #325 to add x86-AVX2 in preparation for formal verification with Hol-Light.
Result of
make run_bench_components CYCLES=PMU
:poly_reduce cycles=197
Currently built on top of the benchmarking PR #315 and #325 to measure differences with C-AVX2.