Skip to content

(2.12) Filestore async flush #7018

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

MauriceVanVeen
Copy link
Member

@MauriceVanVeen MauriceVanVeen commented Jun 30, 2025

The filestore's AsyncFlush setting can now be enabled for a JetStream stream using the AllowAsyncFlush field. Some initial benchmarks showed a 10-15% performance increase when using a R3 stream (3-node cluster with each node in a different availability zone).

Only enabling the filestore's AsyncFlush setting without additional code would be unsafe. That's why this PR introduces some new mechanisms to track async writes, and only allow the data from the log to be compacted once it has hit JetStream's store. This makes the performance improvement essentially "free", with no negative consequences for data safety/consistency.

  • Raft now has ApplyWritePending to track any pending writes done by the filestore. ApplyWritePersisted is called when the write is actually persisted on disk. This allows async flushing of data, while still informing the Raft code which data it can safely compact. (This also requires passing the index of the append entry down into the filestore, so it can call back up with that index once the relevant data is persisted)
  • Because of the above, there's some inherent state desync because the stream's (applied) state and snapshot will be ahead of what may be persisted on disk. In that case a snapshot is marked "async", which informs the server its state on recovery is not complete and the missing entries are safely stored in its Raft log and can be replayed.
  • The AllowAsyncFlush stream setting can be freely/safely enabled and disabled. It will only be effective when using file storage, and only when the stream is backed by a Raft log, i.e. it's replicated.

Relates to #6784

Signed-off-by: Maurice van Veen [email protected]

@MauriceVanVeen MauriceVanVeen requested a review from a team as a code owner June 30, 2025 09:41
@MauriceVanVeen MauriceVanVeen marked this pull request as draft June 30, 2025 10:31
@MauriceVanVeen
Copy link
Member Author

Going to try if this can be simplified a bit more, in draft for now.

@alexbozhenko
Copy link
Contributor

How is this related to the sync_interval setting in nats.conf?

@MauriceVanVeen
Copy link
Member Author

How is this related to the sync_interval setting in nats.conf?

That means when fsync is called on the file. Either always or a specified interval. Currently writes are always synchronous, the file write is done. And is fsync-ed on an interval, or always after the write if sync_interval: always. This PR makes the writes asynchronous when enabled. If sync_interval: always, the writes may still happen asynchronously, but when they are written, fsync would be called right after as well.

@MauriceVanVeen MauriceVanVeen marked this pull request as ready for review June 30, 2025 20:17
Signed-off-by: Maurice van Veen <[email protected]>
Signed-off-by: Maurice van Veen <[email protected]>
Signed-off-by: Maurice van Veen <[email protected]>
@MauriceVanVeen MauriceVanVeen force-pushed the maurice/replicated-async-flush branch from 3096a0c to 87bd91e Compare July 3, 2025 10:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants