Optimize WebFlux multipart upload performance #35366

xyraclius · 2025-08-21T15:22:08Z

🚀 Overview

This PR improves performance in WebFlux multipart upload processing by optimizing how AbstractNestedMatcher scans DataBuffer instances for delimiters.

🔥 Motivation

Multipart uploads in WebFlux currently suffer from slower performance compared to Spring MVC, especially with large files. A significant bottleneck was found in the delimiter matching logic, which processed buffers one byte at a time and caused unnecessary overhead.

🔧 Changes

Reworked AbstractNestedMatcher to use:
- ~~A thread-local buffer (LOCAL_BUFFER) to avoid per-call allocations.~~
  Replaced with an instance-local buffer (localBuffer) to simplify buffer management.
- Chunk-based scanning instead of byte-by-byte iteration.
Extracted helper methods (processChunk, findNextCandidate,
updateMatchIndex) to reduce complexity and improve readability.
Preserved existing API and behavior for backward compatibility.
Added clear Javadoc to explain usage and limitations.

✅ Benefits

Reduced allocation overhead during large multipart uploads.
Improved throughput when scanning large DataBuffer instances.
Better maintainability through clearer structure and documentation.
No breaking changes — existing API and behavior remain intact.
No new tests required — code is already fully covered (100%) by
existing unit tests.

📈 Performance Impact

200MB file via MultipartFile (Spring MVC): ~700 ms
200MB file via FilePart (WebFlux): ~4.1 s
200MB file via PartEvent (WebFlux): ~4.2 s

After this change, large multipart uploads in WebFlux no longer suffer from excessive overhead in delimiter scanning.

📊 Benchmark Results

Before Optimization

curl -w "Elapsed: %{time_total}s\n" \
  -F file="@200mb.pdf" \
  http://localhost:8080/file-part \
  -o /dev/null -s
# Elapsed: 4.177509s

curl -w "Elapsed: %{time_total}s\n" \
  -F file="@200mb.pdf" \
  http://localhost:8080/part-event \
  -o /dev/null -s
# Elapsed: 4.224732s

After Optimization

curl -w "Elapsed: %{time_total}s\n" \
  -F file="@200mb.pdf" \
  http://localhost:8080/file-part \
  -o /dev/null -s
# Elapsed: 0.437359s

curl -w "Elapsed: %{time_total}s\n" \
  -F file="@200mb.pdf" \
  http://localhost:8080/part-event \
  -o /dev/null -s
# Elapsed: 0.317705s


#Test 19.53GB file
curl -w "\nElapsed: %{time_total}s\n" \
  -F file="@game.rar" \
  http://localhost:8080/file-part \
  -o /dev/null -s
# Elapsed: 27.586772s

curl -w "Elapsed: %{time_total}s\n" \
     -F file="@game.rar" \
     http://localhost:8080/part-event \
     -o /dev/null -s
# Elapsed: 18.328275s


#Test Multiple files
curl -w "Elapsed: %{time_total}s\n" \
  -F file="@200mb.pdf" \
  -F file="@game.rar" \
  http://localhost:8080/file-part \
  -o /dev/null -s
# Elapsed: 23.678042s

curl -w "Elapsed: %{time_total}s\n" \
     -F file="@200mb.pdf" \
     -F file="@game.rar" \
     http://localhost:8080/part-event \
     -o /dev/null -s
# Elapsed: 17.001363s

Example

@PostMapping(path = "/file-part", consumes = MediaType.MULTIPART_FORM_DATA_VALUE)
public Mono<Void> uploadFilePart(@RequestPart("file") FilePart file) {
    log.info("Received file part request {}", file.filename());
    return Mono.empty();
}

@PostMapping(path = "/part-event", consumes = MediaType.MULTIPART_FORM_DATA_VALUE)
public Mono<Void> uploadPartEvent(@RequestBody Flux<PartEvent> allPartsEvents) {
    return allPartsEvents.windowUntil(PartEvent::isLast)
        .concatMap(p -> p.switchOnFirst((signal, partEvents) -> {
            if (signal.hasValue()) {
                PartEvent event = signal.get();
                if (event instanceof FormPartEvent formEvent) {
                    return Mono.error(new RuntimeException("Unexpected event: " + event));
                } else if (event instanceof FilePartEvent fileEvent) {
                    log.info("Received part event request {}", fileEvent.filename());
                    return partEvents
                            .ofType(FilePartEvent.class)
                            .doOnNext(file -> DataBufferUtils.release(file.content()))
                            .then(Mono.empty());
                } else {
                    return Mono.error(new RuntimeException("Unexpected event: " + event));
                }
            } else {
                return partEvents;
            }
        })).then();
}

Related Issue

Closes gh-34651

Checklist

Code changes follow Spring Framework contributing guidelines.
Javadoc updated where needed.
Verified no breaking changes introduced.
Verified code is fully covered by existing unit tests (100%).

Improve AbstractNestedMatcher by using a thread-local buffer and chunked scanning to reduce allocations and speed up multipart boundary detection. Closes spring-projectsgh-34651 Signed-off-by: Nabil Fawwaz Elqayyim <[email protected]>

bclozel · 2025-08-21T15:42:00Z

Thanks for raising this @xyraclius , but I'm not sure the approach is valid.

For WebFlux applications, there is no assumption about the processing of a single request. Unlike Servlet applications where the processing of a request is happening on a single thread, reactive apps can schedule work on many different threads. Isn't using a ThreadLocal likely to create concurrency issues if many multipart requests happen in parallel?

xyraclius · 2025-08-21T18:12:36Z

Hi @bclozel,

I initially used ThreadLocal to reduce per-call allocations and improve CPU/memory usage. That said, I now understand that in WebFlux a single request can run across multiple threads, so using ThreadLocal could be unsafe.

The safest approach would be to switch to a per-request local buffer, like:

final byte[] chunk = new byte[8 * 1024];

This avoids any concurrency issues while keeping allocations reasonable. I can implement this change and test the performance to make sure we maintain the improvements.

sdeleuze · 2025-08-22T08:47:11Z

Yeah please refine accordingly and we will review it.

xyraclius · 2025-08-22T14:20:56Z

Hi @bclozel @sdeleuze,
Refined the change to use a per-instance buffer instead of ThreadLocal, as suggested.
✅ Verified correctness and perf still look good.

Please let me know if further adjustments are needed. Thanks for the guidance! 🙏

- Replace ThreadLocal buffer with a per-instance reusable buffer - Improves memory locality and reduces ThreadLocal overhead - Update Javadoc for clarity, performance notes, and subclassing guidance Closes spring-projectsgh-34651 Signed-off-by: Nabil Fawwaz Elqayyim <[email protected]>

Optimize WebFlux multipart upload performance

5d51631

Improve AbstractNestedMatcher by using a thread-local buffer and chunked scanning to reduce allocations and speed up multipart boundary detection. Closes spring-projectsgh-34651 Signed-off-by: Nabil Fawwaz Elqayyim <[email protected]>

spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged or decided on label Aug 21, 2025

bclozel added the status: waiting-for-feedback We need additional information before we can continue label Aug 21, 2025

spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Aug 21, 2025

bclozel added status: waiting-for-feedback We need additional information before we can continue and removed status: feedback-provided Feedback has been provided labels Aug 21, 2025

Merge branch 'main' into 34651

bcfae82

spring-projects-issues added status: feedback-provided Feedback has been provided and removed status: waiting-for-feedback We need additional information before we can continue labels Aug 22, 2025

xyraclius force-pushed the 34651 branch from 73a2f0a to 1bf232d Compare August 22, 2025 17:29

Merge branch 'main' into 34651

3ace6a4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize WebFlux multipart upload performance #35366

Optimize WebFlux multipart upload performance #35366

xyraclius commented Aug 21, 2025 •

edited

Loading

Uh oh!

bclozel commented Aug 21, 2025

Uh oh!

xyraclius commented Aug 21, 2025

Uh oh!

sdeleuze commented Aug 22, 2025

Uh oh!

xyraclius commented Aug 22, 2025

Uh oh!

Uh oh!

Optimize WebFlux multipart upload performance #35366

Are you sure you want to change the base?

Optimize WebFlux multipart upload performance #35366

Conversation

xyraclius commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Overview

🔥 Motivation

🔧 Changes

✅ Benefits

📈 Performance Impact

📊 Benchmark Results

Before Optimization

After Optimization

Example

Related Issue

Checklist

Uh oh!

bclozel commented Aug 21, 2025

Uh oh!

xyraclius commented Aug 21, 2025

Uh oh!

sdeleuze commented Aug 22, 2025

Uh oh!

xyraclius commented Aug 22, 2025

Uh oh!

Uh oh!

xyraclius commented Aug 21, 2025 •

edited

Loading