-
Notifications
You must be signed in to change notification settings - Fork 31
Description
In transformers-neuronx, it is crucial to ensure the inference pipeline, with different optimizations, has the same meaning (semantic equivalence) as the original model. There have been previous cases to fix this subtle layout transformations (e.g. 69d039d). However, we found the correctness of these layout optimizations is not consistent. To be specific, in the most recent version 1ade6d7 we found a similar case to 69d039d, when using the collectives_layout="BSH" feature.
We have created a fix over here #106, and we would like to confirm whether this is indeed a bug in the framework. The PR has the steps to reproduce the bug and the sample outputs.
Your insights are very much appreciated. We will continue following up this issue until it is resolved.
Credits to @wenboqian for providing initial direction to detecting and fixing the bug