Skip to content

Conversation

@r1viollet
Copy link
Collaborator

@r1viollet r1viollet commented Sep 18, 2023

What does this PR do?

Implement a radix tree strategy using bitsets

Motivation

Ensure we are accurate in the way we account for addresses.
Check @richardstartin 's idea

Results

Performance numbers look great.
With the reader thread, some samples are costing CPU

BM_ShortLived_NoTracking/process_time/real_time     866340 ns      3245594 ns          758
BM_ShortLived_Tracking/process_time/real_time       999061 ns      4036038 ns          613
BM_LongLived_NoTracking/process_time                340981 ns       679761 ns         1074
BM_LongLived_Tracking/process_time                  376116 ns      1359466 ns          548

Without the reader thread:

BM_ShortLived_NoTracking/process_time/real_time     471468 ns       987956 ns         1366
BM_ShortLived_Tracking/process_time/real_time       502947 ns      1171483 ns         1000
BM_LongLived_NoTracking/process_time                338881 ns       185030 ns         4789
BM_LongLived_Tracking/process_time                  330957 ns       281179 ns         2535

Add a bitset to track the addresses that are kept for heap profiling.
Fix the missing TLS storage initializations
- Removal of the lock
- Minor fixes around the allocation code paths
Adjust the usage of the address bitset.
- Adjust the hashing logic
Remove the dependency between bitset and sampling period.
Simplify the hashing logic.
Re-test the collision rate of the hash we use.
Improve the bitset set and unset flow to have a single atomic operation.
- Enforce power of two on size of bitset
- Change the log_once api to take into account the place where it is called from
- Naming updates
Improve the unit test for the LOG_ONCE API
While refactoring I introduced a regression on how indexes were computed
Rename local variable to avoid conflict with function name.
Ensure an allocation sample is pushed even if we encounter a
collision in the bitset.
Implement a tree like bitset
This ensures we account for all possible addresses without collisions
@r1viollet r1viollet force-pushed the r1viollet/perf_dealloc_tree_bitset branch from 6c86d29 to 8b6bf27 Compare September 21, 2023 08:07
- Fix the logic
- Ensure we have a bounded number of mid level elements
@r1viollet r1viollet force-pushed the r1viollet/perf_dealloc_tree_bitset branch from 8b6bf27 to 0dc37a2 Compare September 22, 2023 13:37
@r1viollet r1viollet force-pushed the r1viollet/perf_dealloc_code_path branch from a52c80c to aa4e705 Compare October 9, 2023 11:26
Base automatically changed from r1viollet/perf_dealloc_code_path to main October 12, 2023 16:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants