Skip to content

Conversation

@rymnc
Copy link
Member

@rymnc rymnc commented Dec 27, 2024

Warning

This is an experimental PR, not to be merged in, but to start a conversation about using simd when it lands in stable ~ or use a different library to access arch-specific simd instructions. While doing the zkvm benchmarks with large meq's, it results in a lot of execution cycles and proving time. This enhancement should improve things there.

[Link to related issue(s) here, if any]

[Short description of the changes.]
Usage of cpu intrinsics (neon, avx2, avx512f) to improve memory compare performance. detected upto 40% on neon, and 20-30% using avx2.

avx2 and avx512f can probably be optimised in terms of the masking

using divan as the benching lib -

meq_performance_divan_plain  fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ meq_performance                   │               │               │               │         │
   ├─ 100              40.31 ns      │ 61.66 µs      │ 41.31 ns      │ 928.8 ns      │ 100     │ 100
   ├─ 2000             50.74 ns      │ 51.72 ns      │ 51.39 ns      │ 51.26 ns      │ 100     │ 12800
   ├─ 4000             80.03 ns      │ 112.5 ns      │ 81.34 ns      │ 82.78 ns      │ 100     │ 6400
   ├─ 8000             126.9 ns      │ 133.4 ns      │ 129.5 ns      │ 129.5 ns      │ 100     │ 3200
   ├─ 16000            220.6 ns      │ 1.218 µs      │ 228.4 ns      │ 237.8 ns      │ 100     │ 1600
   ╰─ 32000            413.3 ns      │ 582.6 ns      │ 423.7 ns      │ 424.8 ns      │ 100     │ 1600



meq_performance_divan_optimized  fastest       │ slowest       │ median        │ mean          │ samples │ iters
╰─ meq_performance                   │               │               │               │         │
   ├─ 100              40.17 ns      │ 57.95 µs      │ 41.17 ns      │ 872 ns        │ 100     │ 100
   ├─ 2000             36.28 ns      │ 40.19 ns      │ 36.93 ns      │ 37.37 ns      │ 100     │ 12800
   ├─ 4000             49.3 ns       │ 53.86 ns      │ 49.62 ns      │ 49.68 ns      │ 100     │ 12800
   ├─ 8000             75.34 ns      │ 87.06 ns      │ 77.94 ns      │ 77.41 ns      │ 100     │ 6400
   ├─ 16000            129.3 ns      │ 947.1 ns      │ 131.9 ns      │ 140.1 ns      │ 100     │ 1600
   ╰─ 32000            230.9 ns      │ 384.5 ns      │ 236.1 ns      │ 236.7 ns      │ 100     │ 1600

Checklist

  • Breaking changes are clearly marked as such in the PR description and changelog
  • New behavior is reflected in tests
  • If performance characteristic of an instruction change, update gas costs as well or make a follow-up PR for that
  • The specification matches the implemented behavior (link update PR if changes are needed)

Before requesting review

  • I have reviewed the code myself
  • I have created follow-up issues caused by this PR and linked them here

After merging, notify other teams

[Add or remove entries as needed]

@rymnc rymnc added the no changelog Skips the CI changelog check label Dec 27, 2024
@rymnc rymnc force-pushed the chore/use-simd-for-meq branch from 50db594 to 16255b4 Compare December 29, 2024 18:27
@rymnc rymnc force-pushed the chore/use-simd-for-meq branch from 41b7550 to f1b162f Compare December 31, 2024 12:29
@rymnc rymnc changed the title feat(meq): use portable_simd to improve performance of meq feat(meq): use simd to improve performance of meq Dec 31, 2024
@rymnc rymnc force-pushed the chore/use-simd-for-meq branch 6 times, most recently from a867507 to f1b162f Compare January 1, 2025 23:19
@xgreenx
Copy link
Collaborator

xgreenx commented Sep 15, 2025

@rymnc What do we want to do with this PR?

@rymnc
Copy link
Member Author

rymnc commented Sep 15, 2025

I'm happy to land it when simd becomes stable :)

@rymnc
Copy link
Member Author

rymnc commented Sep 15, 2025

also, it is just to demonstrate how we can get better performance on vm level for memory opcodes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

no changelog Skips the CI changelog check

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants