Add helper functions to clear MatX caches and allocations #1092

tbensonatl · 2025-11-11T23:44:01Z

MatX caches (e.g., cuFFT plans or cuBLAS handles) and allocations (user or internal allocations via matxAlloc) are stored in static data structures. These data structures are destroyed during program exit using the corresponding destructors. However, due to ordering of static destructors and atexit handlers, it is possible for resources to be freed after the CUDA context or some other dependent resource has been destroyed. This can result in a segmentation fault during program exit.

The helper function ClearCachesAndAllocations() can be called prior to program exit to free resources allocated by MatX. This will prevent conflicts with other static destructors and atexit handlers and thus allow clean shutdown. The function may also be useful at other times that the user wishes to free resources allocated via MatX.

There are two other helpers, ClearCaches() and FreeAllocations(), to dellocate data associated with the caches (plans, handles, workspaces, etc.) and allocations made via matxAlloc(), respectively.

MatX caches (e.g., cuFFT plans or cuBLAS handles) and allocations (user or internal allocations via matxAlloc) are stored in static data structures. These data structures are destroyed during program exit using the corresponding destructors. However, due to ordering of static destructors and atexit handlers, it is possible for resources to be freed after the CUDA context or some other dependent resource has been destroyed. This can result in a segmentation fault during program exit. The helper function ClearMatXCachesAndAllocations() can be called prior to program exit to free resources allocated by MatX. This will prevent conflicts with other static destructors and atexit handlers and thus allow clean shutdown. The function may also be useful at other times that the user wishes to free resources allocated via MatX. There are two other helpers, ClearMatXCaches() and FreeMatXAllocations(), to dellocate data associated with the caches (plans, handles, workspaces, etc.) and allocations made via matxAlloc(), respectively. Signed-off-by: Thomas Benson <[email protected]>

copy-pr-bot · 2025-11-11T23:44:04Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

tbensonatl · 2025-11-11T23:44:08Z

/build

greptile-apps · 2025-11-11T23:46:58Z

Greptile Overview

Greptile Summary

Added three new helper functions (ClearCaches(), FreeAllocations(), and ClearCachesAndAllocations()) to manually free MatX resources before program exit, addressing static destructor ordering issues that can cause segmentation faults.

Key Changes:

cache.h: Implemented ClearAll() method that frees all cache types (transform caches, stream allocations, LTOIR cache) with proper mutex protection
allocator.h: Refactored locking by moving memory_mtx acquisition from deallocate_internal() to public deallocate() methods; added free_all() helper
Test coverage validates memory is properly freed by checking GPU memory before/after cleanup
Examples and test runner demonstrate proper usage pattern

Implementation Notes:

Lock ordering is consistent: stream_alloc_mutex → memory_mtx (no reverse paths found)
Uses recursive mutexes to prevent same-thread deadlocks
ClearAll() calls matxFree() on stream allocations which are then not double-freed by FreeAllocations() because they're removed from the allocation map
Registry insertion in GetCacheIdFromType() on every call is inefficient but safe (as noted in existing comment)

Confidence Score: 4/5

This PR is safe to merge with low risk - the implementation correctly handles resource cleanup with proper locking and prevents double-free issues
Score reflects solid implementation with comprehensive test coverage and correct thread safety. Minor concern about lock ordering consistency across the codebase, though no actual deadlock paths were identified in this review
Pay close attention to include/matx/core/cache.h - verify that the lock ordering (stream_alloc_mutex before memory_mtx) is consistent throughout the entire codebase to prevent potential deadlocks

Important Files Changed

File Analysis

Filename	Score	Overview
include/matx/core/cache.h	4/5	Added `ClearAll()` method, `CacheRegistry()`, and helper functions `ClearCaches()`/`ClearCachesAndAllocations()`. Implementation correctly handles cache cleanup with proper locking, though registry insertion on every call is inefficient but safe.
include/matx/core/allocator.h	5/5	Refactored `deallocate_internal()` to require caller hold mutex, moved lock acquisition to public `deallocate()` methods. Added `free_all()` helper and public `FreeAllocations()` function. Changes maintain thread safety correctly.
test/00_misc/ClearCacheTests.cu	4/5	New test validates that `ClearCachesAndAllocations()` frees memory by checking GPU memory before/after. Test correctly exercises both cache (cuBLAS) and manual allocations, verifying at least 8 MiB freed.

Sequence Diagram

sequenceDiagram
    participant User
    participant ClearCachesAndAllocations
    participant ClearCaches
    participant matxCache_t
    participant FreeAllocations
    participant MemTracker
    participant CUDA

    User->>ClearCachesAndAllocations: Call cleanup before exit
    ClearCachesAndAllocations->>ClearCaches: Clear all caches
    ClearCaches->>matxCache_t: ClearAll()
    
    Note over matxCache_t: Lock cache_mtx
    loop For each cache ID
        matxCache_t->>matxCache_t: Call CacheFreeHelper.free()
        matxCache_t->>matxCache_t: Clear cache entries (destructors)
    end
    
    Note over matxCache_t: Lock stream_alloc_mutex
    loop For each stream allocation
        matxCache_t->>MemTracker: matxFree(ptr)
        Note over MemTracker: Lock memory_mtx
        MemTracker->>CUDA: cudaFreeAsync()
        MemTracker->>MemTracker: Erase from allocationMap
    end
    
    Note over matxCache_t: Lock ltoir_mutex
    matxCache_t->>matxCache_t: Clear LTOIR cache
    
    ClearCaches-->>ClearCachesAndAllocations: Return
    
    ClearCachesAndAllocations->>FreeAllocations: Free remaining allocations
    FreeAllocations->>MemTracker: free_all()
    Note over MemTracker: Lock memory_mtx
    loop While allocationMap not empty
        MemTracker->>MemTracker: deallocate_internal()
        MemTracker->>CUDA: cudaFree()/cudaFreeHost()
        MemTracker->>MemTracker: Erase from allocationMap
    end
    
    FreeAllocations-->>ClearCachesAndAllocations: Return
    ClearCachesAndAllocations-->>User: Cleanup complete

greptile-apps

_{4 files reviewed, 1 comment}

_{Edit Code Review Agent Settings | Greptile}

include/matx/core/cache.h

examples/channelize_poly_bench.cu

include/matx/core/cache.h

cliffburdick · 2025-11-12T17:33:18Z

Do we need any tests for this?

tbensonatl · 2025-11-12T17:47:37Z

Do we need any tests for this?

I added a clear call to the post-test code in test/main.cu, so currently each ctest executable will clear the caches after running all of the GTests. I considered adding the same to either each GTest individually or the test exit handler. If we do that, then we will not be able to run tests within the same executable concurrently. One option would be to create a new CTest with a single GTest case that implicitly creates several plans/allocations and then clears everything with a free memory check before and after. The risk there is that the free memory stats will be device-wide, so other concurrent CTests will impact the stats. I'll check if CTest supports flagging a test as one that needs to run when no other tests are running as that would work around these issues.

cliffburdick · 2025-11-12T17:50:24Z

Do we need any tests for this?

I added a clear call to the post-test code in test/main.cu, so currently each ctest executable will clear the caches after running all of the GTests. I considered adding the same to either each GTest individually or the test exit handler. If we do that, then we will not be able to run tests within the same executable concurrently. One option would be to create a new CTest with a single GTest case that implicitly creates several plans/allocations and then clears everything with a free memory check before and after. The risk there is that the free memory stats will be device-wide, so other concurrent CTests will impact the stats. I'll check if CTest supports flagging a test as one that needs to run when no other tests are running as that would work around these issues.

I think in some ways not clearing the cache on tests is a feature since it has exposed problems with caching in the past that clearing would hide.

Signed-off-by: Thomas Benson <[email protected]>

tbensonatl · 2025-11-13T01:40:07Z

/build

greptile-apps

_{6 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

cliffburdick · 2025-11-13T02:25:50Z

/build

cliffburdick · 2025-11-13T16:24:24Z

/build

cliffburdick · 2025-11-13T16:50:14Z

/build

cliffburdick · 2025-11-13T18:22:02Z

/build

tbensonatl requested a review from cliffburdick November 11, 2025 23:44

greptile-apps bot reviewed Nov 11, 2025

View reviewed changes

include/matx/core/cache.h Show resolved Hide resolved

cliffburdick reviewed Nov 12, 2025

View reviewed changes

examples/channelize_poly_bench.cu Outdated Show resolved Hide resolved

cliffburdick reviewed Nov 12, 2025

View reviewed changes

include/matx/core/cache.h Outdated Show resolved Hide resolved

cliffburdick approved these changes Nov 12, 2025

View reviewed changes

Add new test case for clearing the cache/allocations

0ce27ad

Signed-off-by: Thomas Benson <[email protected]>

greptile-apps bot reviewed Nov 13, 2025

View reviewed changes

cliffburdick approved these changes Nov 14, 2025

View reviewed changes

cliffburdick merged commit 4b43915 into main Nov 14, 2025
1 check passed

cliffburdick deleted the add-helper-functions-to-clear-caches-allocs branch November 14, 2025 17:00

Add helper functions to clear MatX caches and allocations #1092

Add helper functions to clear MatX caches and allocations #1092

Conversation

tbensonatl commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot bot commented Nov 11, 2025

Uh oh!

tbensonatl commented Nov 11, 2025

Uh oh!

greptile-apps bot commented Nov 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cliffburdick commented Nov 12, 2025

Uh oh!

tbensonatl commented Nov 12, 2025

Uh oh!

cliffburdick commented Nov 12, 2025

Uh oh!

tbensonatl commented Nov 13, 2025

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

cliffburdick commented Nov 13, 2025

Uh oh!

cliffburdick commented Nov 13, 2025

Uh oh!

cliffburdick commented Nov 13, 2025

Uh oh!

cliffburdick commented Nov 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tbensonatl commented Nov 11, 2025 •

edited

Loading

greptile-apps bot commented Nov 11, 2025 •

edited

Loading