You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* clean cuda/rocm code in hpu backend, enable flat_hpu
Signed-off-by: Wang, Yi A <[email protected]>
* fix TP in pageattn
Signed-off-by: Wang, Yi A <[email protected]>
* adjust block table in hpu to improve performance
Signed-off-by: Wang, Yi A <[email protected]>
* enable all the model. not testet yet
Signed-off-by: Wang, Yi A <[email protected]>
* use tensor cache in hpu graph to avoid replay issue
Signed-off-by: Wang, Yi A <[email protected]>
* add moe support, fix qwen/mistral/mixtral crash
Signed-off-by: Wang, Yi A <[email protected]>
* fix phimoe issue
Signed-off-by: Wang, Yi A <[email protected]>
* gpt_bigcode could also go pageattn
Signed-off-by: Wang, Yi A <[email protected]>
* enable dbrx remove some unused code
Signed-off-by: Wang, Yi A <[email protected]>
* multi-modality initial PR
Signed-off-by: Wang, Yi A <[email protected]>
* adjust warmup and enable vlm
Signed-off-by: Wang, Yi A <[email protected]>
* fix incorrect output in qwen2 idefics if hpu graph is used
Signed-off-by: Wang, Yi A <[email protected]>
* remove unused quantization code and enable awq/gptq int4
Signed-off-by: Wang, Yi A <[email protected]>
* fix gptq issue
Signed-off-by: Wang, Yi A <[email protected]>
* enable fp8
Signed-off-by: Wang, Yi A <[email protected]>
* warmup prefill
remove model where pageattn is not used, set block table to None since it's not used
Signed-off-by: Wang, Yi A <[email protected]>
* add warmup_decode
Signed-off-by: Wang, Yi A <[email protected]>
* warmup decode
Signed-off-by: Wang, Yi A <[email protected]>
* remove block_tables and prefill_cache_indices which will lead to dynamic shape
Signed-off-by: Wang, Yi A <[email protected]>
* fix comment
Signed-off-by: Wang, Yi A <[email protected]>
* missing gptj change...
Signed-off-by: Wang, Yi A <[email protected]>
* fix some issue
Signed-off-by: Wang, Yi A <[email protected]>
* remove torch.where to fix incorrect output in hpu graph model
Signed-off-by: Wang, Yi A <[email protected]>
* LLM warmup logic
Signed-off-by: Wang, Yi A <[email protected]>
* multi-modality warmup
Signed-off-by: Wang, Yi A <[email protected]>
* optimize code
Signed-off-by: Wang, Yi A <[email protected]>
* refine log and fix some issue
Signed-off-by: Wang, Yi A <[email protected]>
* fix warmup issue for mllama
Signed-off-by: Wang, Yi A <[email protected]>
* pingpong optimization
Signed-off-by: Wang, Yi A <[email protected]>
* match the latest vllm_extension ops
Signed-off-by: Wang, Yi A <[email protected]>
* work with the latest vllm extension ops
Signed-off-by: Wang, Yi A <[email protected]>
* remove block_scales which is not needed anymore
Signed-off-by: Wang, Yi A <[email protected]>
* improve performance
Signed-off-by: Wang, Yi A <[email protected]>
* prefill bypass graph
Signed-off-by: Wang, Yi A <[email protected]>
* pingpong optimization issue fix
Signed-off-by: Wang, Yi A <[email protected]>
---------
Signed-off-by: Wang, Yi A <[email protected]>
0 commit comments