-
Notifications
You must be signed in to change notification settings - Fork 38.6k
Optimize WebFlux multipart upload performance #35366
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Improve AbstractNestedMatcher by using a thread-local buffer and chunked scanning to reduce allocations and speed up multipart boundary detection. Closes spring-projectsgh-34651 Signed-off-by: Nabil Fawwaz Elqayyim <[email protected]>
Thanks for raising this @xyraclius , but I'm not sure the approach is valid. For WebFlux applications, there is no assumption about the processing of a single request. Unlike Servlet applications where the processing of a request is happening on a single thread, reactive apps can schedule work on many different threads. Isn't using a |
Hi @bclozel, I initially used ThreadLocal to reduce per-call allocations and improve CPU/memory usage. That said, I now understand that in WebFlux a single request can run across multiple threads, so using ThreadLocal could be unsafe. The safest approach would be to switch to a per-request local buffer, like:
This avoids any concurrency issues while keeping allocations reasonable. I can implement this change and test the performance to make sure we maintain the improvements. |
Yeah please refine accordingly and we will review it. |
- Replace ThreadLocal buffer with a per-instance reusable buffer - Improves memory locality and reduces ThreadLocal overhead - Update Javadoc for clarity, performance notes, and subclassing guidance Closes spring-projectsgh-34651 Signed-off-by: Nabil Fawwaz Elqayyim <[email protected]>
🚀 Overview
This PR improves performance in WebFlux multipart upload processing by optimizing how
AbstractNestedMatcher
scansDataBuffer
instances for delimiters.🔥 Motivation
Multipart uploads in WebFlux currently suffer from slower performance compared to Spring MVC, especially with large files. A significant bottleneck was found in the delimiter matching logic, which processed buffers one byte at a time and caused unnecessary overhead.
🔧 Changes
AbstractNestedMatcher
to use:A thread-local buffer (LOCAL_BUFFER
) to avoid per-call allocations.Replaced with an instance-local buffer (
localBuffer
) to simplify buffer management.processChunk
,findNextCandidate
,updateMatchIndex
) to reduce complexity and improve readability.✅ Benefits
DataBuffer
instances.existing unit tests.
📈 Performance Impact
MultipartFile
(Spring MVC): ~700 msFilePart
(WebFlux): ~4.1 sPartEvent
(WebFlux): ~4.2 sAfter this change, large multipart uploads in WebFlux
no longer
suffer from excessive overhead in delimiter scanning.📊 Benchmark Results
Before Optimization
After Optimization
Example
Related Issue
Closes gh-34651
Checklist