-
Notifications
You must be signed in to change notification settings - Fork 0
Background batch data movement - refactoring and DML based alternate path #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: bg-improvements-no-pool-rebal
Are you sure you want to change the base?
Background batch data movement - refactoring and DML based alternate path #9
Conversation
Run tests on CI Run long tests (navy/bench) every day on CI Run CI on prebuild docker image Run only centos build on CI Update docker file used in CI Centos8 is EOL Disable failing clang-format-check Add extra param to build-package.sh Add scripts for rebuilding/pushing docker images Taken from: pmem/dev-utils-kit@30794c3 Extend CI to rebuild docker automatically Update build-cachelib-docker.yml Do not use shallow clone to make sure Docker rebuild logic works correctly. Added required packages to install Intel ittapi Update CI to use intel/CacheLib repo (pmem#17) Add multi-tier navy benchmark and run it on CI - fix navy multi-tier config for NUMA bindings added code coverage support in CacheLib Adding libdml to CentOS docker image (pmem#53) only exclude allocator-test-NavySetupTestm, shm-test-test_page_size tests added perf and numactl to docker packages --------------------------------------------- one large commit for all CI and code coverage see above for the change history.
to utilize combined locking.
This includes printing: - allocSize - allocated memory size - memory usage fraction
…art 2) fix for compressed ptr (upstream) -> compress from false to true
for different pool sizes. We also use getPoolSize(pid), to get total size from all pools across allocators. It also fixes the tiering sizes (pulls changes from what was issue75 rebased commit that did not make it into upstream commits). Rebased to use ramCacheSize.
for the compressed ptr changes that were introduced upstream. - Includes later cosmetic changes added by sounak 9cb5c29
fix for rolling stats (on multi-tier to be followed by multi-tier rolling stats implementation in the following commit)
Hot queue iterator for 2Q. Will start at Hot queue and move to Warm queue if hot queue is exhausted. Useful for promotion semantics if using 2Q replacement. rebased on to develop and added some tests.
- transparent item movement - multi-tier combined locking with exclusive bit (pmem#38) with refactored incRef to support returning the result of markMoving (fail if already moving or exclusvie bit is set) option. - add tests (updated for numa bindings - post combined locking) for transparent item movement
-updated configs for numa bindings
This would lead to deadlock (.e.g in forEachChainedItem) if the child is moving (e.g. marked by Slab Release thread). Instead treat moving bit only to prevent freeing the item and do all synchronization on parent.
Background data movement using periodic workers. Attempts to evict/promote items per given thresholds for each class. These reduce p99 latency since there is a higher chance that an allocation slot is free in the tier we are allocating in. fix race in promotion where releaseBackToAllocator was being called before wakeUpWaiters. reinsert to mm container on failed promotion
…move to fail - updated slab release logic for move failure, but there is still an issue with slab movement. currently investigating.
The assumption for moving items was that once item is unmarked no one can add new waiters for that item. However, since incrementing item ref count was not done under the MoveMap lock, there was a race: item could have been unmarked right after incRef returned incFailedMoving.
* Fix issue with token creation * Do not increment evictFail* stats if evictFailConcurrentFill were incremented
updated the docker gcc version to 12 --------- Co-authored-by: Matt Rae <[email protected]>
- we first check if an item is expired under mmContainer lock and if so mark it for eviction so it is recycled back up to allocateInternalTier.
instead of always inserting to topmost tier
* Chained item movement between tiers - currently sync on the parent item for moving. - updated tests accordingly, note that we can no longer swap parent item if chained item is being moved for slab release. * added some debug checks around chained item check * fix slab release behavior if no movecb
Track latency of per item eviction/promotion between memory tiers
* Set dependencies to working versions and use dependencies from build context, instead of downloading cachelib:develop during build step. This makes sure that dependencies are always build in proper versions. * Fix CacheStats size
if (!handler.valid()) { | ||
auto status = handler.get(); | ||
XDCHECK(handler.valid()) << dmlErrStr(status); | ||
throw std::runtime_error(folly::sformat( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How often do we expect this to happen? I'm wondering if it would make sense to return an error code here instead of throwing an exception
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not seen this error if DML is configured correctly. Theoretically it is possible to hit this error if we oversubscribe to DSA work queues. But that's more of a config issue - the DSA vs CPU batch split ratio has to be tuned to prevent this scenario. If, in the near future, I add any self-tuned mechanism for this DSA vs CPU batch split, then it makes sense to use this error scenario to adjust the ratio and proceed rather than throw a runtime exception.
c5976af
to
59a71ee
Compare
auto dmlBatchRatio = isLarge ? config_.largeItemBatchEvictDsaUsageFraction : | ||
config_.smallItemBatchEvictDsaUsageFraction; | ||
size_t dmlBatchSize = | ||
(config_.dsaEnabled && evictionData.size() >= config_.minBatchSizeForDsaUsage) ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no DSA device - we use regular DML software path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implicit assumption is if dsaEnabled flag is set to true, there is at least one DSA device enabled.
handler = dml::submitdml::hardware(dml::batch, sequence);
In the code currently, if dsaEnabled flag is wrongly set to true without enabling any DSA device, it would trigger a runtime exception. I can set it to dml::automatic which would take care of this scenario but we would lose explicit control over whether a request goes to software or hardware.
// | ||
// @param oldItem Reference to the item being moved | ||
// @param newItemHdl Reference to the handle of the new item being moved into | ||
// @return true If the containers were updated successfully. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really all that is left is to do, and what this function does is one thing: update in the access container - you could add that to the comments.
Or change the function name from book keeper to completeAccessContainerUpdate() or something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed name of book keeper function to completeAccessContainerUpdate()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks really good - thanks - I left two minor comments - once you respond I can merge this.
59a71ee
to
eb23469
Compare
36129e4
to
c67df92
Compare
c67df92
to
57eade8
Compare
No description provided.