Replies: 1 comment
-
| I've discovered an additional issue, which is that even if you run  It will fail with the same errors when using FP8 quantization: Here, 3/8 of the therads fail to load  Presumably, the FP8 quantization kernels require hipBLASlt and so can't run w/ the current bug? BTW, surprisingly when testing at   | 
Beta Was this translation helpful? Give feedback.
                  
                    0 replies
                  
                
            
  
    Sign up for free
    to join this conversation on GitHub.
    Already have an account?
    Sign in to comment
  
        
    
Uh oh!
There was an error while loading. Please reload this page.
-
FYI, I've filed a curious bug I've encountered w/ PyTorch but at least want to mention it here (if not create a duplicate issue yet unless they triage it as a not their problem): pytorch/pytorch#137695
Basically running the latest vLLM (HEAD) and the PyTorch nightly it depends on, hipBLASlt is used by default, and works for
-tp 1to-tp 4but at-tp 8it consistently starts to report errors with loadingTensileLibrary_lazy_gfx942.dat. The workaround is to useTORCH_BLAS_PREFER_HIPBLASLT=0, and at least for tp 1-4, this is slightly faster anyway in my vLLM benchmark_throughput testing.Leaving this here to potentially save some people some hair-pulling, as it took me a while to debug (since it works on lower tps)
Beta Was this translation helpful? Give feedback.
All reactions