[Dynamic batch] Investigate refactoring opportunities for batch management in Plugin and Compiler - ver 2 #31784

DariaMityagina · 2025-08-19T06:56:02Z

Details:

The task is to explore the possibility of moving the reshaping process to the plugin. This would allow the compiler to receive either a network with batch size 1 if reshaping is successful, or a network with a non-1 batch size if reshaping fails. This approach aims to simplify the batch handling process and reduce dependencies on the compiler.

The concept in this PR approach

Verify if the model is compatible with batching on the plugin side.
Attempt to reshape the model using set_batch(1).
If this is successful, we continue with the PLUGIN batch. The reshaped model is sent to the compiler.
The dynamic batch determination function has been updated to rely on the actual tensor shape instead of shapeFromIRModel.

Basically, we want to consider any N=1 network as a potential dynamic-batched-network.

Flow

Plugin receives a batch model (no matter static or dynamic batch)
Plugin tries to reshape it to batch=1
If success, plugin invokes the compiler to compile the model with batch=1
Plugin dumps the blob in the FS
===end of context===
A user calls benchmark_app -m model.blob -data_shape [4, 3, 224, 224]
Plugin loads the model onto the device
Plugin sees the batch -> it creates N (4) infer requests

Tickets:

E-176749

src/plugins/intel_npu/src/backend/src/zero_infer_request.cpp

…and Compiler - no metadata changes

…and Compiler - no metadata changes - fix static tests

…and Compiler - fix BA issues - treat every model with batch 1 as a potentially dynamically batched one

src/plugins/intel_npu/src/plugin/src/plugin.cpp

…and Compiler - validateModelBatch conditions

…and Compiler - dynamic dims limitation

…and Compiler - additional checks

src/plugins/intel_npu/src/plugin/src/plugin.cpp

DariaMityagina · 2025-09-17T10:35:29Z

src/plugins/intel_npu/src/plugin/src/plugin.cpp

+            if (autoOrPluginBatch && pluginBatchingIsSupported && batchedModel) {
+                _logger.info("Attempting to handle batching on the plugin side.");
+                ov::set_batch(modelForCompilation, 1);
+                // TODO: add debatcher for more complicated cases as set_batch is pretty naive.


Maybe isn't needed. To be double-checked.

sivanov-work

IMHO, we have too wide spread an unclear batch/dynamic-batch handling logic here, which actually were imposed by a legacy design.

The things I'd be very eager to discuss are:

I think we should remove any opened presumption about N dimension location(even more by getting the batch from hardcoded BATCH_AXIS positions), because it will not work for 100% cases and we will need to return to this problem again and again making more and more refactoring.
Instead I suggest we use layout information (or ask user to supply it if that doesn't exist) to get batch dimension, as this plugin.cpp part was designed to be generic rather than PLUGIN batch emphasized
even though if we get off BATCH_AXIS and retain these "support-batch-case" various check: I see that these limitations are spread across these TWO places: the one part in "plugin.cpp" side where we check PLUGIN mode, and the actual limitation which are contained in ZeroInferRequest where we are checking data structures once again.
IMHO, plugin.cpp should be clean from any assumptions whatever PLUGIN implementation can or cannot support and doesn't introduce such fine-grained conditions testing. Otherwise we would be destined always fix/add/invent symmetrical changes in both ZeroInferRequest and plugin.cpp as well.
If we had agreed to consider each N=1 batched infer request, as a possible batched/dynamic-batched infer request, then it's also possible to release burden from this plugin.cpp part and always use set_batch(1) for any scenario where PLUGIN mode is involved so that later we could reallocate inner data when set_tensor() comes with a new N.
So that it allows us to implement in ZeroInferRequest N=1 as a generic case and only confine set_tensor()/infer() implementation for event when new tensors are set by user which have a different batch value

As I can see:

we are forced to set_batch(1) when we are dealing with batched network as ZeroInferRequests for PLUGIN mode use that. It seems to me that we MUST set_batch(1) every time we deal with a batched network. We cannot invoke set_bacth(1) for each network as it is not reliable and may fail for non-batched network, therefore we must determine that network is a batched network. Despite this entire legacy logic is, IMHO, redundant and weak- I think that we should stop here and do not proceed further for separating static batch/dynamic batch cases and bringing up any artificial limitations more than we already have. So that I hope that simply using set_batch(1) is enough here
ZeroInferRequest deals with N=1 situation (or doesn't even care what it has got) and only add some checking into set_tensor() that new tensor is N times more of less than existing one. If ALL tensors got properly changed in the same proportion N - that we deal with bathced case whatever static/dynamic it is
IMHO ideal situation, would be is to get off all heuristic/limitation from plugin.cpp and encapsulate them all in ZeroInferRequest

sivanov-work · 2025-09-22T08:38:41Z

src/plugins/intel_npu/src/plugin/src/plugin.cpp

+            const bool batchedModel = ov::get_batch(modelForCompilation) != intel_npu::utils::DEFAULT_BATCH_SIZE;
+
+            if (autoOrPluginBatch && pluginBatchingIsSupported && batchedModel) {
+                _logger.info("Attempting to handle batching on the plugin side.");


tips:
I recommend to extend the logger info by adding existing a current value of batch
.. from {ov::get_batch(modelForCompilation} to "1"

sivanov-work · 2025-09-22T08:41:15Z

src/plugins/intel_npu/src/plugin/src/plugin.cpp

+                ov::set_batch(modelForCompilation, 1);
+                // TODO: add debatcher for more complicated cases as set_batch is pretty naive.
+            } else {
+                _logger.info("Unable to manage batching on the plugin side, so the compiler will take care of it.");


tips:
a value of the batch in info printing will also help in troubleshooting

sivanov-work · 2025-09-22T09:29:41Z

src/plugins/intel_npu/src/plugin/src/plugin.cpp

+        updateBatchMode(ov::intel_npu::BatchMode::COMPILER);
+    }
+
+    if (localConfig.isAvailable(ov::intel_npu::batch_mode.name())) {


tips:
I checked and it seems that all logic requires the ingredient condition localConfig.isAvailable met and it seems that we can wrap all these lines under the single if

if((localConfig.isAvailable()) { ... all new code }

Perhaps it improves readability

sivanov-work · 2025-09-22T10:01:23Z

src/plugins/intel_npu/src/plugin/src/plugin.cpp

+
+            updateBatchMode(ov::intel_npu::BatchMode::COMPILER);
+        } catch (const std::exception& ex) {
+            _logger.info("Couldn't validate and reshape the model. Batching will be handed by compiler.", ex.what());


AUTO & PLUGIN makes no difference then, as far I remember from the compiler code we roll back to the COMPILER mode only if we have got AUTO enabled so that we do not overwrite explicitly requested PLUGIN mode

sivanov-work · 2025-09-22T10:09:53Z

src/plugins/intel_npu/src/plugin/src/plugin.cpp

+        ov::Layout layout = ov::layout::get_layout(input);
+
+        // Batching on plugin is working only when batching is found on 0th dimension
+        if ((shape.size() && shape[intel_npu::utils::BATCH_AXIS].get_max_length() != intel_npu::utils::DEFAULT_BATCH_SIZE) ||


I don;t think that it's good idea to stick to the static BATCH_AXIS while determining a batch dimension, there are many cases when the 0th dimension is not a batch at all, as well as the cases where batch can be determined not at the 0th position

Since we have "layout" attribute affiliated to a model - we could adhere to it.
On the vpux-compiler side we have this logic in determining batches for the debatch method:

check whether or not user specified "layouts" explicitly

check ov::FindBatch outcome

trying to set_batch(1) ...

what do you think if we reuse similar logic? Or do not introduce any logic at the plugin.cpp level at all

sivanov-work · 2025-09-22T10:48:42Z

src/plugins/intel_npu/src/plugin/src/plugin.cpp

+
+    // Plugin batching is applied if:
+    // 1. Both inputs and outputs have batched dimensions
+    // 2. All batch sizes are consistent (should be only DEFAULT_BATCH_SIZE)


I doubt that in case if dynamic batch we can perform this check here, we could only check that Ns are dynamic.
What do you think if we abolish this check at all (for the static case as well) and just set_batch(1) for the model unconditionally provided that the mode is PLUGIN? All substantial checks on batch size will be made in runtime on set_tensor in ZeroInferRequest?

In other words, if we intended to fail (due to different N sizes) - the we will definitely fail on "runtime/set_tensor" phase later. On the other hand, this disrupt the principle "fail fast" as errors can be determined on the compile phase, but we always can just execute "set_tensor" with a default/model execting shapes right after a model compilation.

Possible implementation:
We remove all limitation checks from plugin.cpp
we remove ALL tensors and pipeline allocation from ZeroInferRequest related ctors & initialization.
We move all this allocations into reallocate_pipeline method. We call reallocation_pipeline once all set_tensors are made i.e. right before infer().
So once a model has compiled, we will call set_tensor for each input which will trigger pipeline creation and device memory allocation.
Only after that, the compile_model() seems to be done
Later on inference phase, we reuse reallocate_pipeline logic when new tensors with new N are being requested.

Pros:
we keep plugin.cpp generic
we keep all limitation logic in ZeroInferRequest,
we keep ZeroInferRequest simple

Cons:
We get error due to that limitation later until ZeroInferRequest is created (?) and tests on compilation of batched model will require refactoring, perhaps

Removed these changes and checks from dummy_model. Not applicable anymore. Thanks!

sivanov-work · 2025-09-22T12:09:18Z

src/plugins/intel_npu/src/plugin/src/plugin.cpp

+
+    // Limitation: Plugin batching is not supported when there are dynamic
+    // dimensions other than the batch dimension.
+    if (checkModelDynamicDims(model)) {


As we were discussed this offline the problem of simultaneous existence of batch and dynamic dimensions is that we would have a "gap" in plain tensor data after a first batch end position and before a next batch start position.

That IMHO makes it harder to find where next N-line for a next batch portion is located and we cannot simply find a next batch I in an entire tensor by executing I * get_size() / N
Also there is an opened issue to overcome a such imitation.

What if we do not introduce that limitation at all, as it manifests a lot of brittleness in the code and introduces very specific checks like hasOtherDynamicDims?
it's a rare situation that we have N!=1 and dynamic shape together, which being observed only add a complexity for getting next line of batch in a result tensor, and which will be covered soon as additional issue?

What do you think?

What if we do not introduce that limitation at all, as it manifests a lot of brittleness in the code and introduces very specific checks like hasOtherDynamicDims?
it's a rare situation that we have N!=1 and dynamic shape together, which being observed only add a complexity for getting next line of batch in a result tensor, and which will be covered soon as additional issue?

Sure, we can do it separately in E#179728. Thanks!

…and Compiler - simplify

pereanub

I still don't think that considering all the models that have batch size set to 1 as dynamic models is the correct approach here.

pereanub · 2025-09-23T11:24:46Z

src/plugins/intel_npu/src/common/src/sync_infer_request.cpp


-    OPENVINO_ASSERT(is_dynamic || port.get_shape() == tensor->get_shape(),
+    OPENVINO_ASSERT(is_dynamic || port.get_shape() == tensor->get_shape() ||
+                        tensor->get_shape()[utils::BATCH_AXIS] % port.get_shape()[utils::BATCH_AXIS] == 0,


I'm trying to understand this extra check here, but it doesn't make sense to me. Could you please explain?

Sure, let me provide additional context.

The check_tensor function is invoked within set_tensor, particularly when assigning a tensor to a designated port.
With this implementation, the port's shape is adjusted to have a modified batch size (set to 1), while the tensor retains its original requested dimensions.

We won't pass is_dynamic || port.get_shape() == tensor->get_shape().

For instance, if the tensor has a batch size of 2:
2 % 1 = 0 -> deBatched case

pereanub · 2025-09-23T11:25:47Z

src/plugins/intel_npu/src/common/src/sync_infer_request.cpp

-                        "doesn't match with total blobs count: ",
-                        tensors_size);
+        OPENVINO_ASSERT(
+            batch.is_dynamic() || batch.get_length() == tensors_size || tensors_size % batch.get_length() == 0,


same comment here

DariaMityagina · 2025-09-23T11:58:40Z

I still don't think that considering all the models that have batch size set to 1 as dynamic models is the correct approach here.

This approach was inspired by the requirements for this task:

1. Plugin detection: The plugin identifies a network with a dynamic batch size.
2. Batch conversion: It transforms the network into one with a batch size of 1, if possible.
3. Driver compilation: Only after this transformation does the network get passed to the driver for compilation.
4. Isolation from Compiler and Driver: Crucially, neither the compiler nor the driver is aware that the network originally had a dynamic/static batch size.

- To make dynamic batch compatible with older drivers
- Needed compatibility with old NPU Plugins
- [Stretch goal] No need for Metadata in compiled blobs, NPU Plugin to handle batches != 1 automatically using concurrently created infer requests

To minimize the risks, I will try to limit the application of current changes to the dynamic batch for now, after which we can work on a common solution.

DariaMityagina self-assigned this Aug 19, 2025

DariaMityagina added the category: NPU OpenVINO NPU plugin label Aug 19, 2025

DariaMityagina force-pushed the icv/dm/plugin_batch-ver-2 branch from 30529c1 to 29bbcdc Compare August 19, 2025 07:01

DariaMityagina commented Aug 19, 2025

View reviewed changes

src/plugins/intel_npu/src/backend/src/zero_infer_request.cpp Show resolved Hide resolved

DariaMityagina force-pushed the icv/dm/plugin_batch-ver-2 branch 3 times, most recently from 6f2a537 to b9cbb0f Compare September 9, 2025 23:00

DariaMityagina added 3 commits September 9, 2025 23:02

Investigate refactoring opportunities for batch management in Plugin …

dfac8c5

…and Compiler - no metadata changes

Investigate refactoring opportunities for batch management in Plugin …

7b6f81a

…and Compiler - no metadata changes - fix static tests

Investigate refactoring opportunities for batch management in Plugin …

b9cbb0f

…and Compiler - fix BA issues - treat every model with batch 1 as a potentially dynamically batched one

DariaMityagina commented Sep 10, 2025

View reviewed changes

src/plugins/intel_npu/src/plugin/src/plugin.cpp Outdated Show resolved Hide resolved

DariaMityagina force-pushed the icv/dm/plugin_batch-ver-2 branch from 105c432 to 3a4baa2 Compare September 10, 2025 11:51

DariaMityagina added 3 commits September 10, 2025 11:54

Investigate refactoring opportunities for batch management in Plugin …

3a4baa2

…and Compiler - validateModelBatch conditions

Investigate refactoring opportunities for batch management in Plugin …

0f4b01c

…and Compiler - dynamic dims limitation

Investigate refactoring opportunities for batch management in Plugin …

929e13a

…and Compiler - additional checks

DariaMityagina commented Sep 17, 2025

View reviewed changes

src/plugins/intel_npu/src/plugin/src/plugin.cpp Outdated Show resolved Hide resolved

DariaMityagina commented Sep 17, 2025

View reviewed changes

DariaMityagina added this to the 2025.4 milestone Sep 22, 2025

sivanov-work reviewed Sep 22, 2025

View reviewed changes

DariaMityagina force-pushed the icv/dm/plugin_batch-ver-2 branch from e932c8d to ef91465 Compare September 22, 2025 22:04

Investigate refactoring opportunities for batch management in Plugin …

ef91465

…and Compiler - simplify

pereanub reviewed Sep 23, 2025

View reviewed changes

[Dynamic batch] Investigate refactoring opportunities for batch management in Plugin and Compiler - ver 2 #31784

Are you sure you want to change the base?

[Dynamic batch] Investigate refactoring opportunities for batch management in Plugin and Compiler - ver 2 #31784

Conversation

DariaMityagina commented Aug 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details:

Tickets:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DariaMityagina Sep 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sivanov-work left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pereanub left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DariaMityagina commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

DariaMityagina commented Aug 19, 2025 •

edited

Loading

DariaMityagina Sep 17, 2025 •

edited

Loading

sivanov-work left a comment •

edited

Loading

DariaMityagina commented Sep 23, 2025 •

edited

Loading