Allocate multiple threads to column reconstruction task #7789

eserilev · 2025-07-24T13:22:02Z

Issue Addressed

A POC that allocates 4 threads to column reconstruction tasks. This should also ensure that new tasks don't use the additional threads while reconstruction is running.

eserilev · 2025-07-24T13:22:58Z

@jimmygchen I know in the issue you mentioned that maybe oversubscription isn't the best idea. But just wanted to throw out this POC incase you found the direction useful. I haven't had a chance to test yet, but if you think this is useful I can go ahead and clean this up a bit and start testing

EDIT: I just read some of the comments in the other PR, probably should have done that before opening this lol. But I think this POC is maybe in the spirit of this

For heavy tasks that requires rayon, allowWorkTypes to acquire more than 1 worker (N)

jimmygchen · 2025-08-11T07:39:36Z

btw the 4 threads was a bit arbitrary (could be a good starting point though)
For reconstruction, each blob takes roughly 150ms, so:

Blobs	CPU time (ms)	4 threads (s)	8 threads (s)	16 threads (s)
48	7200	1.80	0.90	0.45
72	10800	2.70	1.35	0.68

there's probably no harm to use more threads if they are available to beacon processor (max_workers), maybe we could consider something like max(4, max_workers / 2)?

eserilev · 2025-08-13T22:13:12Z

I hit this case a few times while running a node on devnet-3

lighthouse/beacon_node/beacon_processor/src/lib.rs

Lines 1290 to 1296 in a2f2028

    
           None => { 
        
               warn!( 
        
                   msg = "no new work and cannot spawn worker", 
        
                   "Unexpected gossip processor condition" 
        
               ); 
        
               None 
        
           }

Its interesting because the metrics themselves don't reflect that we ever max out our worker threads but this case can only be reached if theres no new work events and we are unable to spawn a new worker. This must mean we have hit a situation where we have over allocated threads to the reconstruction task. So I think this PR is doing what its supposed to do, though i'm unsure if we are achieving any real performance gains

This is a snapshot of some of the metrics on that same node:
https://snapshots.raintank.io/dashboard/snapshot/CsDrVt7tVj74LLNO6J5uqqGK5gI5UBQi

Note that I was running this node with prepare-all-payloads, subscribe-all-subnets, slasher, subscribe-all-data-column-subnets, and import-all-attestations in an attempt to "over extend" my node

Not planning on spending any more time on this in the near-term, just wanted to share this info in case someone finds it helpful

Oversubscribe during column reconstruction

e53354b

eserilev added do-not-merge optimization Something to make Lighthouse run more efficiently. das Data Availability Sampling hardening labels Jul 24, 2025

michaelsproul mentioned this pull request Jul 29, 2025

Potential CPU oversubscription in BeaconProcessor due to unscoped rayon usage #7719

Open

resolve merge conflicts

a2f2028

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Allocate multiple threads to column reconstruction task #7789

Allocate multiple threads to column reconstruction task #7789

Uh oh!

eserilev commented Jul 24, 2025

Uh oh!

eserilev commented Jul 24, 2025 •

edited

Loading

Uh oh!

jimmygchen commented Aug 11, 2025

Uh oh!

eserilev commented Aug 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

Allocate multiple threads to column reconstruction task #7789

Are you sure you want to change the base?

Allocate multiple threads to column reconstruction task #7789

Uh oh!

Conversation

eserilev commented Jul 24, 2025

Issue Addressed

Uh oh!

eserilev commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jimmygchen commented Aug 11, 2025

Uh oh!

eserilev commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

eserilev commented Jul 24, 2025 •

edited

Loading

eserilev commented Aug 13, 2025 •

edited

Loading