Skip to content

Commit 009d419

Browse files
Added results for "target-cpu=native"
1 parent b3f0bb5 commit 009d419

File tree

4 files changed

+73
-7
lines changed

4 files changed

+73
-7
lines changed

README.md

Lines changed: 70 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,47 @@ Algorithms implemented:
1717
* `nonsimd`: vertical sum over lanes with a reduce at the end using Rust arrays
1818
* `naive`: sum using rust iterators
1919

20-
## Bench results on my computer
20+
## Bench results on native
21+
22+
Command:
23+
24+
```
25+
RUSTFLAGS="-C target-cpu=native" cargo bench -- "2\^20"
26+
```
27+
28+
### Sum of values
29+
30+
```
31+
core_simd_sum 2^20 f32 [156.96 us 158.06 us 159.40 us]
32+
packed_simd_sum 2^20 f32 [184.17 us 184.47 us 184.85 us]
33+
nonsimd_sum 2^20 f32 [175.05 us 176.26 us 177.95 us]
34+
naive_sum 2^20 f32 [1.6636 ms 1.6700 ms 1.6778 ms]
35+
```
36+
37+
### Sum of nullable values (`Vec<bool>`)
38+
39+
```
40+
core_simd_sum null 2^20 f32 [2.3610 ms 2.3713 ms 2.3831 ms]
41+
packed_simd_sum null 2^20 f32 [1.5737 ms 1.5869 ms 1.6022 ms]
42+
nonsimd_sum null 2^20 f32 [1.8009 ms 1.8133 ms 1.8276 ms]
43+
naive_sum null 2^20 f32 [1.6418 ms 1.6520 ms 1.6660 ms]
44+
```
45+
46+
### Sum of nullable values (`Bitmap`)
47+
48+
```
49+
core_simd_sum bitmap 2^20 f32 [174.24 us 175.10 us 176.21 us]
50+
nonsimd_sum bitmap 2^20 f32 [541.78 us 545.16 us 549.09 us]
51+
naive_sum bitmap 2^20 f32 [1.6740 ms 1.6922 ms 1.7149 ms]
52+
```
53+
54+
## Bench results on default
55+
56+
Command:
57+
58+
```
59+
cargo bench -- "2\^20"
60+
```
2161

2262
### Sum of values
2363

@@ -45,10 +85,36 @@ nonsimd_sum bitmap 2^20 f32 [454.78 us 462.08 us 471.82 us]
4585
naive_sum bitmap 2^20 f32 [1.7633 ms 1.7736 ms 1.7855 ms]
4686
```
4787

48-
### Conclusions so far:
88+
### Conditions
4989

50-
* for non-null sums, it is advantageous (by 10%) to use `packed` or `core`
51-
* for sums with nulls, it is advantageous (by 50%) to use arrays
90+
```
91+
$ lscpu
92+
Architecture: x86_64
93+
CPU op-mode(s): 32-bit, 64-bit
94+
Byte Order: Little Endian
95+
CPU(s): 4
96+
On-line CPU(s) list: 0-3
97+
Thread(s) per core: 2
98+
Core(s) per socket: 2
99+
Socket(s): 1
100+
NUMA node(s): 1
101+
Vendor ID: GenuineIntel
102+
CPU family: 6
103+
Model: 85
104+
Model name: Intel(R) Xeon(R) Platinum 8171M CPU @ 2.60GHz
105+
Stepping: 4
106+
CPU MHz: 2095.077
107+
BogoMIPS: 4190.15
108+
Virtualization: VT-x
109+
Hypervisor vendor: Microsoft
110+
Virtualization type: full
111+
L1d cache: 32K
112+
L1i cache: 32K
113+
L2 cache: 1024K
114+
L3 cache: 36608K
115+
NUMA node0 CPU(s): 0-3
116+
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq vmx ssse3 fma cx16 pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti tpr_shadow vnmi ept vpid fsgsbase bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt avx512cd avx512bw avx512vl xsaveopt xsavec xsaves md_clear
117+
```
52118

53119
## License
54120

benches/sum.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ use criterion::{criterion_group, criterion_main, Criterion};
33
use simd_benches::sum::*;
44

55
fn close(l: f32, r: f32) {
6-
assert!((l - r).abs() < l * 0.0001);
6+
assert!((l - r).abs() < l * 0.001);
77
}
88

99
fn add_benchmark(c: &mut Criterion) {

benches/sum_nulls.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ use criterion::{criterion_group, criterion_main, Criterion};
33
use simd_benches::sum_nulls::*;
44

55
fn close(l: f32, r: f32) {
6-
assert!((l - r).abs() < l * 0.0001);
6+
assert!((l - r).abs() < l * 0.001);
77
}
88

99
fn add_benchmark(c: &mut Criterion) {

benches/sum_nulls_bitmap.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ use simd_benches::bitmap_ops;
44
use simd_benches::sum_nulls_bitmap::*;
55

66
fn close(l: f32, r: f32) {
7-
assert!((l - r).abs() < l * 0.0001);
7+
assert!((l - r).abs() < l * 0.001);
88
}
99

1010
fn add_benchmark(c: &mut Criterion) {

0 commit comments

Comments
 (0)