@@ -17,7 +17,47 @@ Algorithms implemented:
17
17
* ` nonsimd ` : vertical sum over lanes with a reduce at the end using Rust arrays
18
18
* ` naive ` : sum using rust iterators
19
19
20
- ## Bench results on my computer
20
+ ## Bench results on native
21
+
22
+ Command:
23
+
24
+ ```
25
+ RUSTFLAGS="-C target-cpu=native" cargo bench -- "2\^20"
26
+ ```
27
+
28
+ ### Sum of values
29
+
30
+ ```
31
+ core_simd_sum 2^20 f32 [156.96 us 158.06 us 159.40 us]
32
+ packed_simd_sum 2^20 f32 [184.17 us 184.47 us 184.85 us]
33
+ nonsimd_sum 2^20 f32 [175.05 us 176.26 us 177.95 us]
34
+ naive_sum 2^20 f32 [1.6636 ms 1.6700 ms 1.6778 ms]
35
+ ```
36
+
37
+ ### Sum of nullable values (` Vec<bool> ` )
38
+
39
+ ```
40
+ core_simd_sum null 2^20 f32 [2.3610 ms 2.3713 ms 2.3831 ms]
41
+ packed_simd_sum null 2^20 f32 [1.5737 ms 1.5869 ms 1.6022 ms]
42
+ nonsimd_sum null 2^20 f32 [1.8009 ms 1.8133 ms 1.8276 ms]
43
+ naive_sum null 2^20 f32 [1.6418 ms 1.6520 ms 1.6660 ms]
44
+ ```
45
+
46
+ ### Sum of nullable values (` Bitmap ` )
47
+
48
+ ```
49
+ core_simd_sum bitmap 2^20 f32 [174.24 us 175.10 us 176.21 us]
50
+ nonsimd_sum bitmap 2^20 f32 [541.78 us 545.16 us 549.09 us]
51
+ naive_sum bitmap 2^20 f32 [1.6740 ms 1.6922 ms 1.7149 ms]
52
+ ```
53
+
54
+ ## Bench results on default
55
+
56
+ Command:
57
+
58
+ ```
59
+ cargo bench -- "2\^20"
60
+ ```
21
61
22
62
### Sum of values
23
63
@@ -45,10 +85,36 @@ nonsimd_sum bitmap 2^20 f32 [454.78 us 462.08 us 471.82 us]
45
85
naive_sum bitmap 2^20 f32 [1.7633 ms 1.7736 ms 1.7855 ms]
46
86
```
47
87
48
- ### Conclusions so far:
88
+ ### Conditions
49
89
50
- * for non-null sums, it is advantageous (by 10%) to use ` packed ` or ` core `
51
- * for sums with nulls, it is advantageous (by 50%) to use arrays
90
+ ```
91
+ $ lscpu
92
+ Architecture: x86_64
93
+ CPU op-mode(s): 32-bit, 64-bit
94
+ Byte Order: Little Endian
95
+ CPU(s): 4
96
+ On-line CPU(s) list: 0-3
97
+ Thread(s) per core: 2
98
+ Core(s) per socket: 2
99
+ Socket(s): 1
100
+ NUMA node(s): 1
101
+ Vendor ID: GenuineIntel
102
+ CPU family: 6
103
+ Model: 85
104
+ Model name: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
105
+ Stepping: 4
106
+ CPU MHz: 2095.077
107
+ BogoMIPS: 4190.15
108
+ Virtualization: VT-x
109
+ Hypervisor vendor: Microsoft
110
+ Virtualization type: full
111
+ L1d cache: 32K
112
+ L1i cache: 32K
113
+ L2 cache: 1024K
114
+ L3 cache: 36608K
115
+ NUMA node0 CPU(s): 0-3
116
+ Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti tpr_shadow vnmi ept vpid fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vl xsaveopt xsavec xsaves md_clear
117
+ ```
52
118
53
119
## License
54
120
0 commit comments