Skip to content

Conversation

jonathanpwang
Copy link
Contributor

@jonathanpwang jonathanpwang commented Oct 22, 2025

In some edge case where right after we start build_async on the memory merkle subtrees, if the program panics, then the order of drop could be that we drop the initial_memory buffers on the default stream first, while the build_async kernels are still running and using those buffers. This leads to a deadloop. I fixed it by just forcing the drop to drop subtrees first (which should sync their special streams) before dropping initial_memory.

compare: https://github.com/axiom-crypto/openvm-reth-benchmark/actions/runs/18733111153

@Copilot Copilot AI review requested due to automatic review settings October 22, 2025 23:51
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR addresses a race condition in CUDA memory management where panic-induced early drops could cause initial memory buffers to be freed while async merkle tree build operations were still executing on separate streams, resulting in a deadlock.

Key Changes:

  • Implements explicit Drop trait for PersistentMemoryInventoryGPU to enforce proper cleanup order
  • Ensures merkle subtree streams are synchronized before dropping initial memory buffers

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@github-actions
Copy link

group app.proof_time_ms app.cycles app.cells_used leaf.proof_time_ms leaf.cycles leaf.cells_used
verify_fibair (-3 [-1.3%]) 236 322,610 2,058,654 - - -
fibonacci (-15 [-1.5%]) 984 1,500,209 2,100,402 - - -
regex (-58 [-2.5%]) 2,282 4,137,502 17,695,216 - - -
ecrecover (-6 [-0.8%]) 710 122,859 2,263,820 - - -
pairing (-49 [-3.4%]) 1,410 1,745,742 25,468,210 - - -

Commit: c723cf7

Benchmark Workflow

Copy link
Contributor

@Golovanov399 Golovanov399 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean if it works then I guess

fn drop(&mut self) {
// Drop merkle subtrees first so their individual streams synchronize before dropping the
// initial memory buffers. This prevents buffers from dropping before build_async completes.
self.merkle_tree.drop_subtrees();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IS this enough? Strange. I don't see a sync in MemoryMerkleSubTree drop. There is no drop impl there

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just the CudaStream drop implementation. When you remove a subtree, the subtree drops, which drops the stream, which forces a sync.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, nice

@jonathanpwang jonathanpwang merged commit a76a9d0 into main Oct 23, 2025
46 of 47 checks passed
@jonathanpwang jonathanpwang deleted the fix/cu-memory-drop branch October 23, 2025 16:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants