Background batch data movement - refactoring and DML based alternate path #9

guptask · 2023-10-06T09:48:59Z

No description provided.

Run tests on CI Run long tests (navy/bench) every day on CI Run CI on prebuild docker image Run only centos build on CI Update docker file used in CI Centos8 is EOL Disable failing clang-format-check Add extra param to build-package.sh Add scripts for rebuilding/pushing docker images Taken from: pmem/dev-utils-kit@30794c3 Extend CI to rebuild docker automatically Update build-cachelib-docker.yml Do not use shallow clone to make sure Docker rebuild logic works correctly. Added required packages to install Intel ittapi Update CI to use intel/CacheLib repo (pmem#17) Add multi-tier navy benchmark and run it on CI - fix navy multi-tier config for NUMA bindings added code coverage support in CacheLib Adding libdml to CentOS docker image (pmem#53) only exclude allocator-test-NavySetupTestm, shm-test-test_page_size tests added perf and numactl to docker packages --------------------------------------------- one large commit for all CI and code coverage see above for the change history.

to utilize combined locking.

This includes printing: - allocSize - allocated memory size - memory usage fraction

…art 2) fix for compressed ptr (upstream) -> compress from false to true

for different pool sizes. We also use getPoolSize(pid), to get total size from all pools across allocators. It also fixes the tiering sizes (pulls changes from what was issue75 rebased commit that did not make it into upstream commits). Rebased to use ramCacheSize.

for the compressed ptr changes that were introduced upstream. - Includes later cosmetic changes added by sounak 9cb5c29

fix for rolling stats (on multi-tier to be followed by multi-tier rolling stats implementation in the following commit)

Hot queue iterator for 2Q. Will start at Hot queue and move to Warm queue if hot queue is exhausted. Useful for promotion semantics if using 2Q replacement. rebased on to develop and added some tests.

- transparent item movement - multi-tier combined locking with exclusive bit (pmem#38) with refactored incRef to support returning the result of markMoving (fail if already moving or exclusvie bit is set) option. - add tests (updated for numa bindings - post combined locking) for transparent item movement

-updated configs for numa bindings

This would lead to deadlock (.e.g in forEachChainedItem) if the child is moving (e.g. marked by Slab Release thread). Instead treat moving bit only to prevent freeing the item and do all synchronization on parent.

Background data movement using periodic workers. Attempts to evict/promote items per given thresholds for each class. These reduce p99 latency since there is a higher chance that an allocation slot is free in the tier we are allocating in. fix race in promotion where releaseBackToAllocator was being called before wakeUpWaiters. reinsert to mm container on failed promotion

…move to fail - updated slab release logic for move failure, but there is still an issue with slab movement. currently investigating.

The assumption for moving items was that once item is unmarked no one can add new waiters for that item. However, since incrementing item ref count was not done under the MoveMap lock, there was a race: item could have been unmarked right after incRef returned incFailedMoving.

* Fix issue with token creation * Do not increment evictFail* stats if evictFailConcurrentFill were incremented

updated the docker gcc version to 12 --------- Co-authored-by: Matt Rae <[email protected]>

- we first check if an item is expired under mmContainer lock and if so mark it for eviction so it is recycled back up to allocateInternalTier.

instead of always inserting to topmost tier

* Chained item movement between tiers - currently sync on the parent item for moving. - updated tests accordingly, note that we can no longer swap parent item if chained item is being moved for slab release. * added some debug checks around chained item check * fix slab release behavior if no movecb

Track latency of per item eviction/promotion between memory tiers

* Set dependencies to working versions and use dependencies from build context, instead of downloading cachelib:develop during build step. This makes sure that dependencies are always build in proper versions. * Fix CacheStats size

…a struct intro

igchor · 2023-10-06T21:15:37Z

cachelib/allocator/CacheAllocator-inl.h

+    if (!handler.valid()) {
+      auto status = handler.get();
+      XDCHECK(handler.valid()) << dmlErrStr(status);
+      throw std::runtime_error(folly::sformat(


How often do we expect this to happen? I'm wondering if it would make sense to return an error code here instead of throwing an exception

I have not seen this error if DML is configured correctly. Theoretically it is possible to hit this error if we oversubscribe to DSA work queues. But that's more of a config issue - the DSA vs CPU batch split ratio has to be tuned to prevent this scenario. If, in the near future, I add any self-tuned mechanism for this DSA vs CPU batch split, then it makes sense to use this error scenario to adjust the ratio and proceed rather than throw a runtime exception.

byrnedj · 2023-10-11T00:39:34Z

cachelib/allocator/CacheAllocator-inl.h

+  auto dmlBatchRatio = isLarge ? config_.largeItemBatchEvictDsaUsageFraction :
+                                 config_.smallItemBatchEvictDsaUsageFraction;
+  size_t dmlBatchSize =
+      (config_.dsaEnabled && evictionData.size() >= config_.minBatchSizeForDsaUsage) ?


If there is no DSA device - we use regular DML software path?

The implicit assumption is if dsaEnabled flag is set to true, there is at least one DSA device enabled.

handler = dml::submitdml::hardware(dml::batch, sequence);

In the code currently, if dsaEnabled flag is wrongly set to true without enabling any DSA device, it would trigger a runtime exception. I can set it to dml::automatic which would take care of this scenario but we would lose explicit control over whether a request goes to software or hardware.

byrnedj · 2023-10-11T00:41:31Z

cachelib/allocator/CacheAllocator.h

+  //
+  // @param oldItem     Reference to the item being moved
+  // @param newItemHdl  Reference to the handle of the new item being moved into
+  // @return true       If the containers were updated successfully.


Really all that is left is to do, and what this function does is one thing: update in the access container - you could add that to the comments.

Or change the function name from book keeper to completeAccessContainerUpdate() or something.

Changed name of book keeper function to completeAccessContainerUpdate()

byrnedj

It looks really good - thanks - I left two minor comments - once you respond I can merge this.

igchor and others added 30 commits July 23, 2023 14:16

Adds createPutToken and switches findEviction

2a8fa60

to utilize combined locking.

Add memory usage statistics for allocation classes

c3a4db9

This includes printing: - allocSize - allocated memory size - memory usage fraction

Initial multi-tier support implementation (rebased with NUMA and cs p…

2529f0a

…art 2) fix for compressed ptr (upstream) -> compress from false to true

AC stats multi-tier

3cc41bd

This is the additional multi-tier support needed

c432df6

for the compressed ptr changes that were introduced upstream. - Includes later cosmetic changes added by sounak 9cb5c29

added per pool class rolling average latency (upstream PR version)

4cefc44

fix for rolling stats (on multi-tier to be followed by multi-tier rolling stats implementation in the following commit)

added per tier pool class rolling average latency (based on upstream PR)

1f62a63

MM2Q promotion iterators (pmem#1)

489ef20

Hot queue iterator for 2Q. Will start at Hot queue and move to Warm queue if hot queue is exhausted. Useful for promotion semantics if using 2Q replacement. rebased on to develop and added some tests.

basic multi-tier test based on numa bindings

ed7b70f

Aadding new configs to hit_ratio/graph_cache_leader_fobj

94c4974

-updated configs for numa bindings

Do not block reader if a child item is moving

afd1456

This would lead to deadlock (.e.g in forEachChainedItem) if the child is moving (e.g. marked by Slab Release thread). Instead treat moving bit only to prevent freeing the item and do all synchronization on parent.

fix race in moveRegularItemWith sync where insertOrReplace can cause …

6203a95

…move to fail - updated slab release logic for move failure, but there is still an issue with slab movement. currently investigating.

Per tier pool stats (pmem#70)

add2e5f

dummy change to trigger container image rebuild

aedaf97

Fix token creation and stats (pmem#79)

1f21fce

* Fix issue with token creation * Do not increment evictFail* stats if evictFailConcurrentFill were incremented

Updated the docker gcc version to 12 (pmem#83)

9e27d35

updated the docker gcc version to 12 --------- Co-authored-by: Matt Rae <[email protected]>

NUMA bindigs support for private memory (pmem#82)

da7a6bb

Do not run cachelib-centos-8-5 on PRs (pmem#85)

b5ac462

correct handling for expired items in eviction (pmem#86)

50d3ae5

- we first check if an item is expired under mmContainer lock and if so mark it for eviction so it is recycled back up to allocateInternalTier.

Add option to insert items to first free tier (pmem#87)

5632d18

instead of always inserting to topmost tier

edit dockerfile

08d8f33

these submodules work

316133c

Track latency of per item eviction/promotion between memory tiers

8d2c390

Merge pull request pmem#91 from guptask/tier_eviction_latency

b99f2b3

Track latency of per item eviction/promotion between memory tiers

igchor and others added 14 commits August 23, 2023 10:21

Update dependencies (pmem#95)

ff44c3c

* Set dependencies to working versions and use dependencies from build context, instead of downloading cachelib:develop during build step. This makes sure that dependencies are always build in proper versions. * Fix CacheStats size

updated bg evictor

10ea5f4

enable dto

59fd450

handle based

131a041

batch find eviction

62dbd2d

fix wakeup bg

d3787b2

clean up and implement batch promotion

9642348

address comments/remove unecessary code pt. 1

e814951

no online evict/bg workers on tid 1 attempt

f126ae9

fix all slabs allocated

46176d0

addressing comments pt 2 (mostly from sergey) and before eviction dat…

b4c4fbf

…a struct intro

use eviction data struct

9d27a77

update with add batch implementations

2041678

refactored background batch eviction and added DSA support

708513f

igchor reviewed Oct 6, 2023

View reviewed changes

guptask force-pushed the dsa_eviction_promotion branch 2 times, most recently from c5976af to 59a71ee Compare October 9, 2023 08:16

guptask changed the title ~~Background batch data movement - eviction~~ Background batch data movement - refactoring and DML based alternate path Oct 9, 2023

byrnedj reviewed Oct 11, 2023

View reviewed changes

refactored batch promotion and added DML support

eb23469

guptask force-pushed the dsa_eviction_promotion branch from 59a71ee to eb23469 Compare October 16, 2023 10:00

guptask requested a review from byrnedj October 16, 2023 10:12

byrnedj force-pushed the bg-improvements-no-pool-rebal branch from 36129e4 to c67df92 Compare February 28, 2024 21:27

byrnedj force-pushed the bg-improvements-no-pool-rebal branch from c67df92 to 57eade8 Compare March 25, 2024 19:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Background batch data movement - refactoring and DML based alternate path #9

Background batch data movement - refactoring and DML based alternate path #9

Uh oh!

guptask commented Oct 6, 2023

Uh oh!

igchor Oct 6, 2023

Uh oh!

guptask Oct 9, 2023 •

edited

Loading

Uh oh!

byrnedj Oct 11, 2023

Uh oh!

guptask Oct 16, 2023

Uh oh!

byrnedj Oct 11, 2023

Uh oh!

guptask Oct 16, 2023

Uh oh!

byrnedj left a comment

Uh oh!

Uh oh!

Background batch data movement - refactoring and DML based alternate path #9

Are you sure you want to change the base?

Background batch data movement - refactoring and DML based alternate path #9

Uh oh!

Conversation

guptask commented Oct 6, 2023

Uh oh!

igchor Oct 6, 2023

Choose a reason for hiding this comment

Uh oh!

guptask Oct 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

byrnedj Oct 11, 2023

Choose a reason for hiding this comment

Uh oh!

guptask Oct 16, 2023

Choose a reason for hiding this comment

Uh oh!

byrnedj Oct 11, 2023

Choose a reason for hiding this comment

Uh oh!

guptask Oct 16, 2023

Choose a reason for hiding this comment

Uh oh!

byrnedj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

guptask Oct 9, 2023 •

edited

Loading