There is the following issue on this page: https://docs.pytorch.org/tutorials/intermediate/realtime_rpi.html
I have a YOLO like model which I used QAT and the quantized model is very slow. It is slower 4 time than the floating point model. Although I used fused layers.
Any help?