Run ABC passes in parallel #5266

rocallahan · 2025-08-04T04:06:51Z

What are the reasons/motivation for this change?

For large circuits AbcPass can run ABC hundreds or thousands of times, once per unique clkdomain_t. Some of those ABC runs take a while. Running those ABCs in parallel is possible, because the cells assigned to each ABC run are disjoint. This PR improves the runtime of AbcPass on one of our large circuits by 5x (which translates into a 3x speedup of synthesis end-to-end).

Explain how this is achieved.

This builds on PR #5239.

Reading and writing RTLIL are not thread-safe for various reasons, and fixing that would be difficult. So for now we stick with reading and writing RTLIL on the main thread. We split up the per-clkdomain work into a "prepare" phase which builds the gate netlist and removes the corresponding cells from the module, a "run ABC" phase which actually runs ABC, and an "extract" phase which processes the results of an ABC run to create fresh cells in the module. Everything happens on the main thread other than the "run ABC" phases. For parallelism, we create a set of worker threads which pull work from a concurrent queue fed by the main thread.

To simplify things and also provide a small performance boost, the writing of lutdefs.txt and stdcells.genlib is factored out so it happens just once per pass instead of once per ABC run.

Writing thread-safe code in C++ is very scary, especially in a large existing project like Yosys not designed for multithreading. Fortunately the "run ABC" phase is not large and mostly self-contained so the risk of this PR may be acceptable.

One thread safety problem I had to tackle was logging. Yosys::log() is not thread-safe and making it thread-safe would be very invasive. The obvious approach of putting locks around everything would slow down the single-threaded case and not scale well for parallel threads, plus the desired behavior of some of the logging functions w.r.t. concurrent logging calls is not clear. So I've created a DeferredLogs class which exposes a log() function which simply captures the logs for a particular work item into a buffer. Eventually, back on the main thread, those deferred logs are printed via the standard log() function. If log timing is enabled then the timestamps are not meaningful; we can fix that by extending the logging API so callers can pass in previously captured timestamps, but I'd prefer to do that after my logging PR #5243 has been merged.

If applicable, please suggest to reviewers how they can test the change.

The existing Yosys test suite exercises this code fairly well. If we take this PR and especially if we carry on down the road of adding more parallelism, it would be good to run the Yosys test suite with TSAN regularly.

KrystalDelusion · 2025-08-04T08:19:00Z

WASI doesn't have threading support, so you'll need to add a way to downgrade to not using threads. There is already a DISABLE_ABC_THREADS make option that disables pthread when building ABC, which may still make sense here, but it may also be better to have an ENABLE_THREADS feature option which can be shared with anything else that ends up implementing threads.

phsauter · 2025-08-04T10:45:08Z

To simplify things and also provide a small performance boost, the writing of lutdefs.txt and stdcells.genlib is factored out so it happens just once per pass instead of once per ABC run.

If you want to go even further, ABC will always convert liberty libraries to internal genlib, it does not work with full liberty data. So you could actually even factor this part out so liberty files are also only proceeded once. This provides very significant time savings if you have a larger (commercial) library.

phsauter · 2025-08-04T10:56:21Z

I will test this on a few larger designs to see what it does to memory usage but I actually think this should be fine, in my experience with more 'trivial' multithreading of ABC using xargs it doesn't noticeably increase peak memory usage.

Another thing to consider is that it might be interesting to sort the extracted netlists by size and always start with the largest once as they will likely take the longest in ABC which could then limit time spent in multithreaded mode if they are queued too late.

rocallahan · 2025-08-04T12:07:55Z

Another thing to consider is that it might be interesting to sort the extracted netlists by size and always start with the largest once as they will likely take the longest in ABC which could then limit time spent in multithreaded mode if they are queued too late.

That's a great idea. Could and probably should be done as a separate PR after this lands.

whitequark · 2025-08-04T15:18:15Z

WASI doesn't have threading support,

WASI does: you need to build for the wasm32-wasip1-threads target instead of wasm32-wasip1. Since Yosys is single-threaded I didn't bother adding a compile-time option, but it may be worth it checking if threads are supported using #ifdef _REENTRANT.

There is a good reason to keep threading support optional: it requires more hostcalls from the runtime, and at least in the browser, it requires SharedArrayBuffer support, which means you need to have some quite annoying workarounds, and deploying the Wasm build from e.g. GitHub Pages becomes very tricky as it doesn't send the right CORS headers.

rocallahan · 2025-08-05T09:46:19Z

There is a performance issue I need to investigate so it's not ready for review right now.

I'm not sure how to set YOSYS_ENABLE_THREADS in the Windows build. It should build OK on Windows with threads disabled.

KrystalDelusion · 2025-08-06T00:12:43Z

I'm not sure how to set YOSYS_ENABLE_THREADS in the Windows build. It should build OK on Windows with threads disabled.

I think you need to modify the .vcxproj file, similar to how the cpp standard is overridden:

yosys/misc/create_vcxsrc.sh

Line 33 in da01e17

    
           sed -i 's,</AdditionalIncludeDirectories>,</AdditionalIncludeDirectories>\n      <LanguageStandard>stdcpp17</LanguageStandard>\n      <AdditionalOptions>/Zc:__cplusplus %(AdditionalOptions)</AdditionalOptions>,g' "$vcxsrc"/YosysVS/YosysVS.vcxproj.new

I'm not sure exactly how you tell visual studio how to use the pthread lib, but there is a field in there for preprocessor definitions:

<PreprocessorDefinitions>_YOSYS_;_CRT_SECURE_NO_WARNINGS;WIN32;_DEBUG;_CONSOLE;_LIB;%(PreprocessorDefinitions)</PreprocessorDefinitions>

It's probably fine to leave it as-is though.

…bles.

…odules to an `AbcConfig` struct.

…ion `extract()` Splits up the big `abc_module()` function and isolates the code that modifies the design after running ABC.

…ess in context

… attribute

Currently `assign_map` is rebuilt from the module from scratch every time we invoke ABC. That doesn't scale when we do thousands of ABC runs over large modules. Instead, create it once and then maintain incrementally it as we update the module.

…wires in the module every time we run ABC. This does not scale when we run ABC thousands of times in a single AbcPass.

…run.

`prepare_module()` will have to run on the main thread.

Large circuits can run hundreds or thousands of ABCs in a single AbcPass. For some circuits, some of those ABC runs can run for hundreds of seconds. Running ABCs in parallel with each other and in parallel with main-thread processing (reading and writing BLIF files, copying ABC BLIF output into the design) can give large speedups.

Doing ABC runs in parallel can actually make things slower when every ABC run requires spawning an ABC subprocess --- especially when using popen(), which on glibc does not use vfork(). What seems to happen is that constant fork()ing keeps making the main process data pages copy-on-write, so the main process code that is setting up each ABC call takes a lot of minor page-faults, slowing it down. The solution is pretty straightforward although a little tricky to implement. We just reuse ABC subprocesses. Instead of passing the ABC script name on the command line, we spawn an ABC REPL and pipe a command into it to source the script. When that's done we echo an `ABC_DONE` token instead of exiting. Yosys then puts the ABC process onto a stack which we can pull from the next time we do an ABC run. For one of our large designs, this is an additional 5x speedup of the primary AbcPass. It does 5155 ABC runs, all very small; runtime of the AbcPass goes from 760s to 149s (not very scientific benchmarking but the effect size is large).

rocallahan · 2025-08-13T21:32:28Z

I've updated the PR. Mainly I've added another commit that uses a pool of ABC processes and reuses ABC processes instead of always spawning a new one for every ABC run. This avoids some situations where doing parallel ABC runs could actually be a regression.

phsauter · 2025-08-13T22:11:18Z

Now that there is a threadpool I think it makes sense to have a scratchpad (Yosys' internal config system) value to set a maximum number of threads.
Its not a must have but a nice to have imo.

rocallahan · 2025-08-14T22:07:04Z

Now that there is a threadpool I think it makes sense to have a scratchpad (Yosys' internal config system) value to set a maximum number of threads. Its not a must have but a nice to have imo.

Do you want it in this PR or a separate PR?

ShinyKate requested a review from widlarizer August 4, 2025 18:49

rocallahan force-pushed the abc-parallel branch 3 times, most recently from 705cf74 to 7367ef3 Compare August 5, 2025 09:45

rocallahan marked this pull request as draft August 5, 2025 09:45

rocallahan force-pushed the abc-parallel branch from 7367ef3 to 52556cf Compare August 11, 2025 22:43

ShinyKate assigned widlarizer Aug 12, 2025

rocallahan force-pushed the abc-parallel branch 3 times, most recently from fff515e to d3d557d Compare August 13, 2025 04:53

rocallahan added 13 commits August 13, 2025 05:44

Move ABC pass state to a struct instead of storing it in global varia…

4ba42c4

…bles.

Move the input parameters to abc_module that are identical across m…

ceedcec

…odules to an `AbcConfig` struct.

Move code in abc_module() that modifies the design into a new funct…

53c72c0

…ion `extract()` Splits up the big `abc_module()` function and isolates the code that modifies the design after running ABC.

Make module a parameter of the function so we can change its constn…

885bb74

…ess in context

Fix indentation

ccb23ff

Mark kept FF output wires as ports directly instead of via the 'keep'…

9f2d302

… attribute

Compute is_port in AbcPass without iterating through all cells and …

99dca0a

…wires in the module every time we run ABC. This does not scale when we run ABC thousands of times in a single AbcPass.

Build FfInitVals for the entire module once and use it for every ABC …

2336027

…run.

Split abc_module() into prepare_module() and run_abc()

b76419c

`prepare_module()` will have to run on the main thread.

Only write out stdcells/lutcosts once for all ABC runs

8b51b10

Stop using log_signal() in abc.cc because it's not thread-safe

5f22364

rocallahan force-pushed the abc-parallel branch from d3d557d to 4fd01fa Compare August 13, 2025 05:46

rocallahan force-pushed the abc-parallel branch from 4fd01fa to f71e9e3 Compare August 13, 2025 19:33

rocallahan marked this pull request as ready for review August 13, 2025 21:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Run ABC passes in parallel #5266

Run ABC passes in parallel #5266

Uh oh!

rocallahan commented Aug 4, 2025

Uh oh!

KrystalDelusion commented Aug 4, 2025

Uh oh!

phsauter commented Aug 4, 2025

Uh oh!

phsauter commented Aug 4, 2025

Uh oh!

rocallahan commented Aug 4, 2025

Uh oh!

whitequark commented Aug 4, 2025 •

edited

Loading

Uh oh!

rocallahan commented Aug 5, 2025

Uh oh!

KrystalDelusion commented Aug 6, 2025

Uh oh!

rocallahan commented Aug 13, 2025 •

edited

Loading

Uh oh!

phsauter commented Aug 13, 2025

Uh oh!

rocallahan commented Aug 14, 2025

Uh oh!

Uh oh!

Run ABC passes in parallel #5266

Are you sure you want to change the base?

Run ABC passes in parallel #5266

Uh oh!

Conversation

rocallahan commented Aug 4, 2025

Uh oh!

KrystalDelusion commented Aug 4, 2025

Uh oh!

phsauter commented Aug 4, 2025

Uh oh!

phsauter commented Aug 4, 2025

Uh oh!

rocallahan commented Aug 4, 2025

Uh oh!

whitequark commented Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rocallahan commented Aug 5, 2025

Uh oh!

KrystalDelusion commented Aug 6, 2025

Uh oh!

rocallahan commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phsauter commented Aug 13, 2025

Uh oh!

rocallahan commented Aug 14, 2025

Uh oh!

Uh oh!

whitequark commented Aug 4, 2025 •

edited

Loading

rocallahan commented Aug 13, 2025 •

edited

Loading