-
Notifications
You must be signed in to change notification settings - Fork 2.2k
[wip] feat(fuzz): SharedCorpus
for multiple worker threads
#11769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
pub(crate) struct CorpusEntry { | ||
// Unique corpus identifier. | ||
uuid: Uuid, | ||
pub(crate) uuid: Uuid, | ||
// Total mutations of corpus as primary source. | ||
total_mutations: usize, | ||
pub(crate) total_mutations: usize, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pub(crate)
accessibility is a temp measure, will be removed once I remove the existing CorpusManager
which is currently being used InvariantExecutor
SharedCorpus
for multiple worker threadsSharedCorpus
for multiple worker threads
in_memory_corpus: Arc<RwLock<Vec<CorpusEntry>>>, | ||
/// Number of failed replays from persisted corpus. | ||
failed_replays: Arc<AtomicUsize>, | ||
/// History of binned hitcount of edges seen during fuzzing | ||
history_map: Arc<RwLock<Vec<u8>>>, | ||
/// Corpus metrics. | ||
pub(crate) metrics: Arc<RwLock<CorpusMetrics>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest making these per-worker (avoiding locks) and designating one worker the master. Then at some interval, each worker would copy its corpus "changeset" to the master i.e. what's new since the last sync. The master node then would merge new inputs from disk with its own corpus and keep all entries that contribute unique coverage in-memory.
# corpus dir
master/
worker1/
worker2/
sync creates
master/worker1_new/
master/worker2_new/
AFL ++ stores a bit map of the history_map (1-byte hitcount -> bool indicating hit) for each input and then uses this bit map to make sure it doesn't eject an entry that is unique (this would mean updating is_favored
to mean it contributes a unique bit to the bitmap). However, I'm not exactly sure how slave corpus reloads should work...
Regarding the sync interval: for invariant tests that are run for 256 runs during CI, I don't think it makes sense to parallelize or sync. For long-running campaigns, if there are more invariant contracts than threads available, you would have to run them round robin, and it'd make sense to sync when the task stops and minimize the corpus when it restarts. If there are less invariant contracts than threads, then you can just cleanly sync at a given interval e.g. ten minutes (or maybe a fraction of the total run-time if it's less)
I am not attached to the use of directories for sync like AFL++ and it may be possible to use some concurrency primitive like bi-directional channels in Rust. Having a mutex on the history map is not scalable, however. Feedback to my feedback welcome :)
Motivation
towards #8898
Solution
CorpusManager
intoSharedCorpus
andCorpusWorker
SharedCorpus
holds the global corpus values for the fuzz test that will be accessed by multipleCorpusWorkers
CorpusWorker
handles fuzz input generation.CorpusWorker
writes to the globalSharedCorpus
to add new corpus entries, update metrics and evict old entriesNote: This PR does not address parallelizing the fuzz runs, only prepares for it. Opened for initial feedback on the approach.
PR Checklist