In the file pseudoquant_linear_fns.py, inside the class PseudoQuant4x16NoMasterFn (line151-155), the forward pass contains the following code:
y = torch.nn.functional.linear(x_flat_dq, weight_dq, bias)
y = y.unflatten(dim=0, sizes=x.shape[:-1])
if bias is not None:
y += bias
The manual addition (y += bias) results in the bias being added twice, which leads to incorrect inference output when module has bias term .