When INFERENCE_MODE is set to false, the model still runs in no_grad mode. Is that intentional? This prevents the serving of models that requires gradients at inference time, such as differential rendering for example. Can we instead use the default mode when INFERENCE_MODE=false (as is implemented the pull-request: triton-inference-server/pytorch_backend#146) or would it be preferable to add additional parameters to enable the default mode?
Thanx!