Commit 2696951
ARROW-11300: [Rust][DataFusion] Further performance improvements on hash aggregation with small groups
Based on #9234, this PR improves the situation described in https://issues.apache.org/jira/browse/ARROW-11300.
The current situation is that we call `take` on arrays, which is fine, but causes a lot of small `Arrays` to be created / allocated. when we have only a small number of rows in each group.
This improves the results on the group by queries on db-benchmark:
PR:
```
q1 took 32 ms
q2 took 422 ms
q3 took 3468 ms
q4 took 44 ms
q5 took 3166 ms
q7 took 3081 ms
```
#9234 (different results from that PR description as this has now partitioning enabled and a custom allocator)
```
q1 took 34 ms
q2 took 389 ms
q3 took 4590 ms
q4 took 47 ms
q5 took 5152 ms
q7 took 3941 ms
```
The PR changes the algorithm to:
* Create indices / offsets of all keys / indices new in the batch.
* `take` the arrays based on indices in one go (so it only requires one bigger allocation for each array)
* Use `slice` based on the offsets to take values from the arrays and pass it to the accumulators.
Closes #9271 from Dandandan/hash_agg_few_rows
Authored-by: Heres, Daniel <[email protected]>
Signed-off-by: Jorge C. Leitao <[email protected]>1 parent 61b0cb1 commit 2696951
1 file changed
+79
-48
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
| 24 | + | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
| |||
30 | 31 | | |
31 | 32 | | |
32 | 33 | | |
33 | | - | |
34 | | - | |
| 34 | + | |
35 | 35 | | |
36 | | - | |
37 | | - | |
| 36 | + | |
| 37 | + | |
38 | 38 | | |
39 | 39 | | |
40 | 40 | | |
| |||
43 | 43 | | |
44 | 44 | | |
45 | 45 | | |
46 | | - | |
47 | | - | |
48 | | - | |
49 | | - | |
50 | | - | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
51 | 49 | | |
52 | | - | |
53 | 50 | | |
54 | 51 | | |
| 52 | + | |
55 | 53 | | |
56 | 54 | | |
57 | 55 | | |
58 | 56 | | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
59 | 62 | | |
60 | 63 | | |
61 | 64 | | |
| |||
322 | 325 | | |
323 | 326 | | |
324 | 327 | | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
325 | 359 | | |
326 | 360 | | |
327 | | - | |
| 361 | + | |
328 | 362 | | |
329 | 363 | | |
330 | | - | |
331 | | - | |
332 | | - | |
333 | | - | |
334 | | - | |
335 | | - | |
336 | | - | |
337 | | - | |
338 | | - | |
339 | | - | |
340 | | - | |
341 | | - | |
342 | | - | |
343 | | - | |
344 | | - | |
345 | | - | |
346 | | - | |
347 | | - | |
348 | | - | |
349 | | - | |
350 | | - | |
351 | | - | |
352 | | - | |
353 | | - | |
354 | | - | |
355 | | - | |
356 | | - | |
357 | | - | |
358 | | - | |
359 | | - | |
360 | | - | |
361 | | - | |
362 | | - | |
363 | | - | |
364 | | - | |
365 | | - | |
366 | | - | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
367 | 398 | | |
368 | 399 | | |
369 | 400 | | |
| |||
0 commit comments