From the new tests I added in #159 , the GPU 64 bit precision computations seem to perform closer to 32 bit precision CPU computations. The GitHub CI for example did not pass on the new tests until I lowered the tolerance for the 64 bit precision GPU computations closer to the 32 bit precision tolerance on CPU.