-
Notifications
You must be signed in to change notification settings - Fork 2.7k
[Dynamic batch] Investigate refactoring opportunities for batch management in Plugin and Compiler - ver 2 #31784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
30529c1
to
29bbcdc
Compare
6f2a537
to
b9cbb0f
Compare
…and Compiler - no metadata changes
…and Compiler - no metadata changes - fix static tests
…and Compiler - fix BA issues - treat every model with batch 1 as a potentially dynamically batched one
105c432
to
3a4baa2
Compare
…and Compiler - validateModelBatch conditions
…and Compiler - dynamic dims limitation
…and Compiler - additional checks
if (autoOrPluginBatch && pluginBatchingIsSupported && batchedModel) { | ||
_logger.info("Attempting to handle batching on the plugin side."); | ||
ov::set_batch(modelForCompilation, 1); | ||
// TODO: add debatcher for more complicated cases as set_batch is pretty naive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe isn't needed. To be double-checked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMHO, we have too wide spread an unclear batch/dynamic-batch handling logic here, which actually were imposed by a legacy design.
The things I'd be very eager to discuss are:
-
I think we should remove any opened presumption about N dimension location(even more by getting the batch from hardcoded BATCH_AXIS positions), because it will not work for 100% cases and we will need to return to this problem again and again making more and more refactoring.
Instead I suggest we use layout information (or ask user to supply it if that doesn't exist) to get batch dimension, as thisplugin.cpp
part was designed to be generic rather than PLUGIN batch emphasized -
even though if we get off BATCH_AXIS and retain these "support-batch-case" various check: I see that these limitations are spread across these TWO places: the one part in "plugin.cpp" side where we check PLUGIN mode, and the actual limitation which are contained in ZeroInferRequest where we are checking data structures once again.
IMHO,plugin.cpp
should be clean from any assumptions whatever PLUGIN implementation can or cannot support and doesn't introduce such fine-grained conditions testing. Otherwise we would be destined always fix/add/invent symmetrical changes in both ZeroInferRequest and plugin.cpp as well. -
If we had agreed to consider each N=1 batched infer request, as a possible batched/dynamic-batched infer request, then it's also possible to release burden from this
plugin.cpp
part and always useset_batch(1)
for any scenario where PLUGIN mode is involved so that later we could reallocate inner data when set_tensor() comes with a new N.
So that it allows us to implement in ZeroInferRequest N=1 as a generic case and only confine set_tensor()/infer() implementation for event when new tensors are set by user which have a different batch value
As I can see:
-
we are forced to
set_batch(1)
when we are dealing with batched network as ZeroInferRequests for PLUGIN mode use that. It seems to me that we MUSTset_batch(1)
every time we deal with a batched network. We cannot invokeset_bacth(1)
for each network as it is not reliable and may fail for non-batched network, therefore we must determine that network is a batched network. Despite this entire legacy logic is, IMHO, redundant and weak- I think that we should stop here and do not proceed further for separating static batch/dynamic batch cases and bringing up any artificial limitations more than we already have. So that I hope that simply usingset_batch(1)
is enough here -
ZeroInferRequest deals with N=1 situation (or doesn't even care what it has got) and only add some checking into
set_tensor()
that new tensor is N times more of less than existing one. If ALL tensors got properly changed in the same proportion N - that we deal with bathced case whatever static/dynamic it is -
IMHO ideal situation, would be is to get off all heuristic/limitation from
plugin.cpp
and encapsulate them all in ZeroInferRequest
const bool batchedModel = ov::get_batch(modelForCompilation) != intel_npu::utils::DEFAULT_BATCH_SIZE; | ||
|
||
if (autoOrPluginBatch && pluginBatchingIsSupported && batchedModel) { | ||
_logger.info("Attempting to handle batching on the plugin side."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tips:
I recommend to extend the logger info by adding existing a current value of batch
.. from {ov::get_batch(modelForCompilation} to "1"
ov::set_batch(modelForCompilation, 1); | ||
// TODO: add debatcher for more complicated cases as set_batch is pretty naive. | ||
} else { | ||
_logger.info("Unable to manage batching on the plugin side, so the compiler will take care of it."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tips:
a value of the batch in info printing will also help in troubleshooting
updateBatchMode(ov::intel_npu::BatchMode::COMPILER); | ||
} | ||
|
||
if (localConfig.isAvailable(ov::intel_npu::batch_mode.name())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tips:
I checked and it seems that all logic requires the ingredient condition localConfig.isAvailable
met and it seems that we can wrap all these lines under the single if
if((localConfig.isAvailable()) {
... all new code
}
Perhaps it improves readability
|
||
updateBatchMode(ov::intel_npu::BatchMode::COMPILER); | ||
} catch (const std::exception& ex) { | ||
_logger.info("Couldn't validate and reshape the model. Batching will be handed by compiler.", ex.what()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AUTO & PLUGIN makes no difference then, as far I remember from the compiler code we roll back to the COMPILER mode only if we have got AUTO enabled so that we do not overwrite explicitly requested PLUGIN mode
ov::Layout layout = ov::layout::get_layout(input); | ||
|
||
// Batching on plugin is working only when batching is found on 0th dimension | ||
if ((shape.size() && shape[intel_npu::utils::BATCH_AXIS].get_max_length() != intel_npu::utils::DEFAULT_BATCH_SIZE) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don;t think that it's good idea to stick to the static BATCH_AXIS while determining a batch dimension, there are many cases when the 0th dimension is not a batch at all, as well as the cases where batch can be determined not at the 0th position
Since we have "layout" attribute affiliated to a model - we could adhere to it.
On the vpux-compiler side we have this logic in determining batches for the debatch method:
- check whether or not user specified "layouts" explicitly
- check ov::FindBatch outcome
- trying to set_batch(1) ...
what do you think if we reuse similar logic? Or do not introduce any logic at the plugin.cpp level at all
|
||
// Plugin batching is applied if: | ||
// 1. Both inputs and outputs have batched dimensions | ||
// 2. All batch sizes are consistent (should be only DEFAULT_BATCH_SIZE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt that in case if dynamic batch we can perform this check here, we could only check that Ns are dynamic.
What do you think if we abolish this check at all (for the static case as well) and just set_batch(1)
for the model unconditionally provided that the mode is PLUGIN? All substantial checks on batch size will be made in runtime on set_tensor in ZeroInferRequest?
In other words, if we intended to fail (due to different N sizes) - the we will definitely fail on "runtime/set_tensor" phase later. On the other hand, this disrupt the principle "fail fast" as errors can be determined on the compile phase, but we always can just execute "set_tensor" with a default/model execting shapes right after a model compilation.
Possible implementation:
We remove all limitation checks from plugin.cpp
we remove ALL tensors and pipeline allocation from ZeroInferRequest related ctors & initialization.
We move all this allocations into reallocate_pipeline
method. We call reallocation_pipeline
once all set_tensors
are made i.e. right before infer()
.
So once a model has compiled, we will call set_tensor
for each input which will trigger pipeline creation and device memory allocation.
Only after that, the compile_model()
seems to be done
Later on inference phase, we reuse reallocate_pipeline
logic when new tensors with new N are being requested.
Pros:
we keep plugin.cpp
generic
we keep all limitation logic in ZeroInferRequest,
we keep ZeroInferRequest simple
Cons:
We get error due to that limitation later until ZeroInferRequest is created (?) and tests on compilation of batched model will require refactoring, perhaps
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed these changes and checks from dummy_model
. Not applicable anymore. Thanks!
|
||
// Limitation: Plugin batching is not supported when there are dynamic | ||
// dimensions other than the batch dimension. | ||
if (checkModelDynamicDims(model)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we were discussed this offline the problem of simultaneous existence of batch and dynamic dimensions is that we would have a "gap" in plain tensor data after a first batch end position and before a next batch start position.
That IMHO makes it harder to find where next N-line for a next batch portion is located and we cannot simply find a next batch I in an entire tensor by executing I * get_size() / N
Also there is an opened issue to overcome a such imitation.
What if we do not introduce that limitation at all, as it manifests a lot of brittleness in the code and introduces very specific checks like hasOtherDynamicDims
?
it's a rare situation that we have N!=1 and dynamic shape together, which being observed only add a complexity for getting next line of batch in a result tensor, and which will be covered soon as additional issue?
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we do not introduce that limitation at all, as it manifests a lot of brittleness in the code and introduces very specific checks like hasOtherDynamicDims?
it's a rare situation that we have N!=1 and dynamic shape together, which being observed only add a complexity for getting next line of batch in a result tensor, and which will be covered soon as additional issue?
Sure, we can do it separately in E#179728. Thanks!
e932c8d
to
ef91465
Compare
…and Compiler - simplify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still don't think that considering all the models that have batch size set to 1 as dynamic models is the correct approach here.
|
||
OPENVINO_ASSERT(is_dynamic || port.get_shape() == tensor->get_shape(), | ||
OPENVINO_ASSERT(is_dynamic || port.get_shape() == tensor->get_shape() || | ||
tensor->get_shape()[utils::BATCH_AXIS] % port.get_shape()[utils::BATCH_AXIS] == 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to understand this extra check here, but it doesn't make sense to me. Could you please explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me provide additional context.
The check_tensor
function is invoked within set_tensor
, particularly when assigning a tensor to a designated port.
With this implementation, the port's shape is adjusted to have a modified batch size (set to 1), while the tensor retains its original requested dimensions.
We won't pass is_dynamic || port.get_shape() == tensor->get_shape()
.
For instance, if the tensor has a batch size of 2:
2 % 1 = 0
-> deBatched case
"doesn't match with total blobs count: ", | ||
tensors_size); | ||
OPENVINO_ASSERT( | ||
batch.is_dynamic() || batch.get_length() == tensors_size || tensors_size % batch.get_length() == 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment here
This approach was inspired by the requirements for this task:
To minimize the risks, I will try to limit the application of current changes to the dynamic batch for now, after which we can work on a common solution. |
Details:
The concept in this PR approach
set_batch(1)
.PLUGIN
batch. The reshaped model is sent to the compiler.shapeFromIRModel
.Basically, we want to consider any
N=1
network as a potential dynamic-batched-network.Flow
batch=1
batch=1
===end of context===
-m model.blob
-data_shape [4, 3, 224, 224]
Tickets: