Skip to content

Conversation

xiaguan
Copy link
Collaborator

@xiaguan xiaguan commented Jul 15, 2025

This PR eliminates redundant memcpy invocations within non–zero-copy get interfaces, yielding a further leap in performance.

- remove SliceGuard and SliceBuffer RAII classes
- replace SimpleAllocator with offset_allocator::Allocator
- use AllocationHandle for automatic memory management
- simplify buffer allocation with single contiguous allocations
- add offset_allocator implementation in new directory
- update CMakeLists to include new allocator source files
- add comprehensive test suite for offset_allocator

Signed-off-by: Jinyang Su <[email protected]>
@xiaguan xiaguan force-pushed the offset_allocator branch from 5401a16 to 8cf9c00 Compare July 15, 2025 09:44
@xiaguan
Copy link
Collaborator Author

xiaguan commented Jul 15, 2025

Moreover, I notice that we currently lack any dedicated C++test suite for store_py.cpp; we have relied upon Python-level coverage test.

@xiaguan xiaguan requested review from ykwd and stmatengss and removed request for ykwd July 15, 2025 12:07
@ykwd
Copy link
Collaborator

ykwd commented Jul 16, 2025

Do we have some memory utilization benchmark for this allocator?

@xiaguan
Copy link
Collaborator Author

xiaguan commented Jul 16, 2025

Do we have some memory utilization benchmark for this allocator?

This allocator eagerly merge released memory at deallocation time; so long as the region is entirely free, the largest contiguous span remains directly usable.

There's test for this case

// Test continuous allocation and deallocation with random sizes
TEST_F(OffsetAllocatorTest, ContinuousRandomAllocationDeallocation) {
    std::random_device rd;
    std::mt19937 gen(rd());
    std::uniform_int_distribution<uint32> size_dist(1,
                                                    1024 * 64);  // 1B to 64KB

    const int max_iterations = 20000;

    // Allocate and deallocate random sizes
    for (int i = 0; i < max_iterations; ++i) {
        uint32_t size = size_dist(gen);
        auto handle = allocator->allocate(size);
        EXPECT_TRUE(handle.has_value()) << "Failed to allocate size: " << size;
        // It will free automatically when handle goes out of scope
    }

    auto full_space_handle = allocator->allocate(1024 * 1024);
    ASSERT_TRUE(full_space_handle.has_value());
    EXPECT_EQ(full_space_handle->size(), 1024 * 1024);
}

@ykwd
Copy link
Collaborator

ykwd commented Jul 16, 2025

Do we have some memory utilization benchmark for this allocator?

This allocator eagerly merge released memory at deallocation time; so long as the region is entirely free, the largest contiguous span remains directly usable.

There's test for this case

// Test continuous allocation and deallocation with random sizes
TEST_F(OffsetAllocatorTest, ContinuousRandomAllocationDeallocation) {
    std::random_device rd;
    std::mt19937 gen(rd());
    std::uniform_int_distribution<uint32> size_dist(1,
                                                    1024 * 64);  // 1B to 64KB

    const int max_iterations = 20000;

    // Allocate and deallocate random sizes
    for (int i = 0; i < max_iterations; ++i) {
        uint32_t size = size_dist(gen);
        auto handle = allocator->allocate(size);
        EXPECT_TRUE(handle.has_value()) << "Failed to allocate size: " << size;
        // It will free automatically when handle goes out of scope
    }

    auto full_space_handle = allocator->allocate(1024 * 1024);
    ASSERT_TRUE(full_space_handle.has_value());
    EXPECT_EQ(full_space_handle->size(), 1024 * 1024);
}

I wonder what the memory utilization ratio will be after a long series of allocation and eviction, of different size objects. Will there be many holes in the memory space that prevent memory merging and make the allocation of large-sized objects fail?

@xiaguan
Copy link
Collaborator Author

xiaguan commented Jul 16, 2025

Update the test, result is

Largest allocatable block after random allocations: 1006632960 bytes
Utilization: 0.9375

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants