diff --git a/docs/en/get_started/ascend/get_started.md b/docs/en/get_started/ascend/get_started.md index 262381cadd..376548b0dc 100644 --- a/docs/en/get_started/ascend/get_started.md +++ b/docs/en/get_started/ascend/get_started.md @@ -1,72 +1,35 @@ -# Get Started with Huawei Ascend (Atlas 800T A2 & Atlas 300I Duo) +# Get Started with Huawei Ascend +We currently support running lmdeploy on **Atlas 800T A3, Atlas 800T A2 and Atlas 300I Duo**. The usage of lmdeploy on a Huawei Ascend device is almost the same as its usage on CUDA with PytorchEngine in lmdeploy. Please read the original [Get Started](../get_started.md) guide before reading this tutorial. -Here is the [supported model list](../../supported_models/supported_models.md#PyTorchEngine-on-Huawei-Ascend-Platform). +Here is the [supported model list](../../supported_models/supported_models.md#PyTorchEngine-on-Other-Platforms). > \[!IMPORTANT\] -> We have uploaded a docker image with KUNPENG CPU to aliyun(from lmdeploy 0.7.1 + dlinfer 0.1.6). +> We have uploaded a docker image with KUNPENG CPU to aliyun. > Please try to pull the image by following command: +> +> Atlas 800T A3: +> +> `docker pull crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:a3-latest` +> +> (Atlas 800T A3 currently supports only the Qwen-series with eager mode.) +> > Atlas 800T A2: -> `docker pull crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:910b-latest` +> +> `docker pull crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:a2-latest` +> > 300I Duo: -> `docker pull crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:310p-latest` -> The dockerfile described below still works, you can try -> both pulling image and build your own image by dockerfile. - -## Installation - -We highly recommend that users build a Docker image for streamlined environment setup. - -Git clone the source code of lmdeploy and the Dockerfile locates in the `docker` directory: - -```shell -git clone https://github.com/InternLM/lmdeploy.git -cd lmdeploy -``` - -### Environment Preparation - -The Docker version is supposed to be no less than `18.09`. And `Ascend Docker Runtime` should be installed by following [the official guide](https://www.hiascend.com/document/detail/zh/mindx-dl/600/clusterscheduling/clusterschedulingig/.clusterschedulingig/dlug_installation_012.html). - -> \[!CAUTION\] -> If error message `libascend_hal.so: cannot open shared object file` shows, that means **Ascend Docker Runtime** is not installed correctly! - -#### Ascend Drivers, Firmware and CANN - -The target machine needs to install the Huawei driver and firmware version not lower than 23.0.3, refer to -[CANN Driver and Firmware Installation](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha001/softwareinst/instg/instg_0005.html) -and [download resources](https://www.hiascend.com/hardware/firmware-drivers/community?product=4&model=26&cann=8.0.RC2.beta1&driver=1.0.25.alpha). - -And the CANN (version 8.0.RC2.beta1) software packages should also be downloaded from [Ascend Resource Download Center](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.RC2.beta1&product=4&model=26) themselves. Make sure to place the `Ascend-cann-kernels-910b*.run`, `Ascend-cann-nnal_*.run` and `Ascend-cann-toolkit*-aarch64.run` under the root directory of lmdeploy source code - -#### Build Docker Image - -Run the following command in the root directory of lmdeploy to build the image: - -```bash -DOCKER_BUILDKIT=1 docker build -t lmdeploy-aarch64-ascend:latest \ -    -f docker/Dockerfile_aarch64_ascend . -``` - -The `Dockerfile_aarch64_ascend` is tested on Kunpeng CPU. For intel CPU, please try [this dockerfile](https://github.com/InternLM/lmdeploy/issues/2745#issuecomment-2473285703) (which is not fully tested) - -If the following command executes without any errors, it indicates that the environment setup is successful. - -```bash -docker run -e ASCEND_VISIBLE_DEVICES=0 --rm --name lmdeploy -t lmdeploy-aarch64-ascend:latest lmdeploy check_env -``` - -For more information about running the Docker client on Ascend devices, please refer to the [guide](https://www.hiascend.com/document/detail/zh/mindx-dl/600/clusterscheduling/dockerruntimeug/dlruntime_ug_013.html) +> +> `docker pull crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:300i-duo-latest` +> +> (Atlas 300I Duo currently works only with graph mode.) +> +> To build the environment yourself, refer to the Dockerfiles [here](../../../../docker). ## Offline batch inference -> \[!TIP\] -> Graph mode has been supported on Atlas 800T A2. -> Users can set `eager_mode=False` to enable graph mode, or, set `eager_mode=True` to disable graph mode. -> (Please source `/usr/local/Ascend/nnal/atb/set_env.sh` before enabling graph mode) - ### LLM inference Set `device_type="ascend"` in the `PytorchEngineConfig`: @@ -74,12 +37,11 @@ Set `device_type="ascend"` in the `PytorchEngineConfig`: ```python from lmdeploy import pipeline from lmdeploy import PytorchEngineConfig -if __name__ == "__main__": -    pipe = pipeline("internlm/internlm2_5-7b-chat", -     backend_config=PytorchEngineConfig(tp=1, device_type="ascend", eager_mode=True)) -    question = ["Shanghai is", "Please introduce China", "How are you?"] -    response = pipe(question) -    print(response) +pipe = pipeline("internlm/internlm2_5-7b-chat", + backend_config=PytorchEngineConfig(tp=1, device_type="ascend")) +question = ["Shanghai is", "Please introduce China", "How are you?"] +response = pipe(question) +print(response) ``` ### VLM inference @@ -89,34 +51,28 @@ Set `device_type="ascend"` in the `PytorchEngineConfig`: ```python from lmdeploy import pipeline, PytorchEngineConfig from lmdeploy.vl import load_image -if __name__ == "__main__": - pipe = pipeline('OpenGVLab/InternVL2-2B', -     backend_config=PytorchEngineConfig(tp=1, device_type='ascend', eager_mode=True)) -    image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') -    response = pipe(('describe this image', image)) -    print(response) +pipe = pipeline('OpenGVLab/InternVL2-2B', + backend_config=PytorchEngineConfig(tp=1, device_type='ascend')) +image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') +response = pipe(('describe this image', image)) +print(response) ``` ## Online serving -> \[!TIP\] -> Graph mode has been supported on Atlas 800T A2. -> Graph mode is default enabled in online serving. Users can add `--eager-mode` to disable graph mode. -> (Please source `/usr/local/Ascend/nnal/atb/set_env.sh` before enabling graph mode) - ### Serve a LLM model Add `--device ascend` in the serve command. ```bash -lmdeploy serve api_server --backend pytorch --device ascend --eager-mode internlm/internlm2_5-7b-chat +lmdeploy serve api_server --backend pytorch --device ascend internlm/internlm2_5-7b-chat ``` Run the following commands to launch docker container for lmdeploy LLM serving: ```bash -docker exec -it --net=host crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:latest \ -    bash -i -c "lmdeploy serve api_server --backend pytorch --device ascend --eager-mode internlm/internlm2_5-7b-chat" +docker run -it --net=host crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:a2-latest \ +    bash -i -c "lmdeploy serve api_server --backend pytorch --device ascend internlm/internlm2_5-7b-chat" ``` ### Serve a VLM model @@ -124,14 +80,14 @@ docker exec -it --net=host crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyunc Add `--device ascend` in the serve command ```bash -lmdeploy serve api_server --backend pytorch --device ascend --eager-mode OpenGVLab/InternVL2-2B +lmdeploy serve api_server --backend pytorch --device ascend OpenGVLab/InternVL2-2B ``` Run the following commands to launch docker container for lmdeploy VLM serving: ```bash -docker exec -it --net=host crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:latest \ -    bash -i -c "lmdeploy serve api_server --backend pytorch --device ascend --eager-mode OpenGVLab/InternVL2-2B" +docker run -it --net=host crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:a2-latest \ +    bash -i -c "lmdeploy serve api_server --backend pytorch --device ascend OpenGVLab/InternVL2-2B" ``` ## Inference with Command line Interface @@ -139,14 +95,14 @@ docker exec -it --net=host crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyunc Add `--device ascend` in the serve command. ```bash -lmdeploy chat internlm/internlm2_5-7b-chat --backend pytorch --device ascend --eager-mode +lmdeploy chat internlm/internlm2_5-7b-chat --backend pytorch --device ascend ``` Run the following commands to launch lmdeploy chatting after starting container: ```bash -docker exec -it crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:latest \ -    bash -i -c "lmdeploy chat --backend pytorch --device ascend --eager-mode internlm/internlm2_5-7b-chat" +docker run -it crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:a2-latest \ +    bash -i -c "lmdeploy chat --backend pytorch --device ascend internlm/internlm2_5-7b-chat" ``` ## Quantization diff --git a/docs/en/get_started/camb/get_started.md b/docs/en/get_started/camb/get_started.md new file mode 100644 index 0000000000..5b6e622667 --- /dev/null +++ b/docs/en/get_started/camb/get_started.md @@ -0,0 +1,99 @@ +# Cambricon + +The usage of lmdeploy on a Cambricon device is almost the same as its usage on CUDA with PytorchEngine in lmdeploy. +Please read the original [Get Started](../get_started.md) guide before reading this tutorial. + +Here is the [supported model list](../../supported_models/supported_models.md#PyTorchEngine-on-Other-Platforms). + +> \[!IMPORTANT\] +> We have uploaded a docker image to aliyun. +> Please try to pull the image by following command: +> +> `docker pull crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/camb:latest` + +> \[!IMPORTANT\] +> Currently, launching multi-device inference on Cambricon accelerators requires manually starting Ray. +> +> Below is an example for a 2-devices setup: +> +> ```shell +> export MLU_VISIBLE_DEVICES=0,1 +> ray start --head --resources='{"MLU": 2}' +> ``` + +## Offline batch inference + +### LLM inference + +Set `device_type="camb"` in the `PytorchEngineConfig`: + +```python +from lmdeploy import pipeline +from lmdeploy import PytorchEngineConfig +pipe = pipeline("internlm/internlm2_5-7b-chat", + backend_config=PytorchEngineConfig(tp=1, device_type="camb")) +question = ["Shanghai is", "Please introduce China", "How are you?"] +response = pipe(question) +print(response) +``` + +### VLM inference + +Set `device_type="camb"` in the `PytorchEngineConfig`: + +```python +from lmdeploy import pipeline, PytorchEngineConfig +from lmdeploy.vl import load_image +pipe = pipeline('OpenGVLab/InternVL2-2B', + backend_config=PytorchEngineConfig(tp=1, device_type='camb')) +image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') +response = pipe(('describe this image', image)) +print(response) +``` + +## Online serving + +### Serve a LLM model + +Add `--device camb` in the serve command. + +```bash +lmdeploy serve api_server --backend pytorch --device camb internlm/internlm2_5-7b-chat +``` + +Run the following commands to launch docker container for lmdeploy LLM serving: + +```bash +docker run -it --net=host crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/camb:latest \ +    bash -i -c "lmdeploy serve api_server --backend pytorch --device camb internlm/internlm2_5-7b-chat" +``` + +### Serve a VLM model + +Add `--device camb` in the serve command + +```bash +lmdeploy serve api_server --backend pytorch --device camb OpenGVLab/InternVL2-2B +``` + +Run the following commands to launch docker container for lmdeploy VLM serving: + +```bash +docker run -it --net=host crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/camb:latest \ +    bash -i -c "lmdeploy serve api_server --backend pytorch --device camb OpenGVLab/InternVL2-2B" +``` + +## Inference with Command line Interface + +Add `--device camb` in the serve command. + +```bash +lmdeploy chat internlm/internlm2_5-7b-chat --backend pytorch --device camb +``` + +Run the following commands to launch lmdeploy chatting after starting container: + +```bash +docker run -it crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/camb:latest \ +    bash -i -c "lmdeploy chat --backend pytorch --device camb internlm/internlm2_5-7b-chat" +``` diff --git a/docs/en/get_started/index.rst b/docs/en/get_started/index.rst index 4343ee9ab1..1cb3bb78e2 100644 --- a/docs/en/get_started/index.rst +++ b/docs/en/get_started/index.rst @@ -3,6 +3,8 @@ On Other Platforms .. toctree:: :maxdepth: 1 - :caption: NPU(Huawei) + :caption: OtherPF ascend/get_started.md + maca/get_started.md + camb/get_started.md diff --git a/docs/en/get_started/maca/get_started.md b/docs/en/get_started/maca/get_started.md new file mode 100644 index 0000000000..5c647a379e --- /dev/null +++ b/docs/en/get_started/maca/get_started.md @@ -0,0 +1,89 @@ +# MetaX-tech + +The usage of lmdeploy on a MetaX-tech device is almost the same as its usage on CUDA with PytorchEngine in lmdeploy. +Please read the original [Get Started](../get_started.md) guide before reading this tutorial. + +Here is the [supported model list](../../supported_models/supported_models.md#PyTorchEngine-on-Other-Platforms). + +> \[!IMPORTANT\] +> We have uploaded a docker image to aliyun. +> Please try to pull the image by following command: +> +> `docker pull crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/maca:latest` + +## Offline batch inference + +### LLM inference + +Set `device_type="maca"` in the `PytorchEngineConfig`: + +```python +from lmdeploy import pipeline +from lmdeploy import PytorchEngineConfig +pipe = pipeline("internlm/internlm2_5-7b-chat", + backend_config=PytorchEngineConfig(tp=1, device_type="maca")) +question = ["Shanghai is", "Please introduce China", "How are you?"] +response = pipe(question) +print(response) +``` + +### VLM inference + +Set `device_type="maca"` in the `PytorchEngineConfig`: + +```python +from lmdeploy import pipeline, PytorchEngineConfig +from lmdeploy.vl import load_image +pipe = pipeline('OpenGVLab/InternVL2-2B', + backend_config=PytorchEngineConfig(tp=1, device_type='maca')) +image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') +response = pipe(('describe this image', image)) +print(response) +``` + +## Online serving + +### Serve a LLM model + +Add `--device maca` in the serve command. + +```bash +lmdeploy serve api_server --backend pytorch --device maca internlm/internlm2_5-7b-chat +``` + +Run the following commands to launch docker container for lmdeploy LLM serving: + +```bash +docker run -it --net=host crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/maca:latest \ +    bash -i -c "lmdeploy serve api_server --backend pytorch --device maca internlm/internlm2_5-7b-chat" +``` + +### Serve a VLM model + +Add `--device maca` in the serve command + +```bash +lmdeploy serve api_server --backend pytorch --device maca OpenGVLab/InternVL2-2B +``` + +Run the following commands to launch docker container for lmdeploy VLM serving: + +```bash +docker run -it --net=host crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/maca:latest \ +    bash -i -c "lmdeploy serve api_server --backend pytorch --device maca OpenGVLab/InternVL2-2B" +``` + +## Inference with Command line Interface + +Add `--device maca` in the serve command. + +```bash +lmdeploy chat internlm/internlm2_5-7b-chat --backend pytorch --device maca +``` + +Run the following commands to launch lmdeploy chatting after starting container: + +```bash +docker run -it crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/maca:latest \ +    bash -i -c "lmdeploy chat --backend pytorch --device maca internlm/internlm2_5-7b-chat" +``` diff --git a/docs/en/supported_models/supported_models.md b/docs/en/supported_models/supported_models.md index 2766fdb83d..38f02fb1fa 100644 --- a/docs/en/supported_models/supported_models.md +++ b/docs/en/supported_models/supported_models.md @@ -123,28 +123,28 @@ The following tables detail the models supported by LMDeploy's TurboMind engine * [2] PyTorch engine removes the support of original llava models after v0.6.4. Please use their corresponding transformers models instead, which can be found in https://huggingface.co/llava-hf ``` -## PyTorchEngine on Huawei Ascend Platform +## PyTorchEngine on Other Platforms -| | | | Atlas 800T A2 | Atlas 800T A2 | Atlas 800T A2 | Atlas 800T A2 | Atlas 300I Duo | -| :------------: | :-------: | :--: | :--------------: | :--------------: | :-----------: | :-----------: | :------------: | -| Model | Size | Type | FP16/BF16(eager) | FP16/BF16(graph) | W8A8(graph) | W4A16(eager) | FP16(graph) | -| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes | - | -| Llama3 | 8B | LLM | Yes | Yes | Yes | Yes | Yes | -| Llama3.1 | 8B | LLM | Yes | Yes | Yes | Yes | Yes | -| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes | -| InternLM2.5 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes | -| InternLM3 | 8B | LLM | Yes | Yes | Yes | Yes | Yes | -| Mixtral | 8x7B | LLM | Yes | Yes | No | No | Yes | -| QWen1.5-MoE | A2.7B | LLM | Yes | - | No | No | - | -| QWen2(.5) | 7B | LLM | Yes | Yes | Yes | Yes | Yes | -| QWen2-VL | 2B, 7B | MLLM | Yes | Yes | - | - | - | -| QWen2.5-VL | 3B - 72B | MLLM | Yes | Yes | - | - | Yes | -| QWen2-MoE | A14.57B | LLM | Yes | - | No | No | - | -| QWen3 | 0.6B-235B | LLM | Yes | Yes | No | No | Yes | -| DeepSeek-V2 | 16B | LLM | No | Yes | No | No | - | -| InternVL(v1.5) | 2B-26B | MLLM | Yes | - | Yes | Yes | - | -| InternVL2 | 1B-40B | MLLM | Yes | Yes | Yes | Yes | Yes | -| InternVL2.5 | 1B-78B | MLLM | Yes | Yes | Yes | Yes | Yes | -| InternVL3 | 1B-78B | MLLM | Yes | Yes | Yes | Yes | Yes | -| CogVLM2-chat | 19B | MLLM | Yes | No | - | - | - | -| GLM4V | 9B | MLLM | Yes | No | - | - | - | +| | | | Atlas 800T A2 | Atlas 800T A2 | Atlas 800T A2 | Atlas 800T A2 | Atlas 300I Duo | Atlas 800T A3 | Maca C500 | Cambricon | +| :------------: | :-------: | :--: | :--------------: | :--------------: | :-----------: | :-----------: | :------------: | :--------------: | :-------: | :-------: | +| Model | Size | Type | FP16/BF16(eager) | FP16/BF16(graph) | W8A8(graph) | W4A16(eager) | FP16(graph) | FP16/BF16(eager) | BF/FP16 | BF/FP16 | +| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes | - | Yes | Yes | Yes | +| Llama3 | 8B | LLM | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | +| Llama3.1 | 8B | LLM | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | +| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | +| InternLM2.5 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | +| InternLM3 | 8B | LLM | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | +| Mixtral | 8x7B | LLM | Yes | Yes | No | No | Yes | - | Yes | Yes | +| QWen1.5-MoE | A2.7B | LLM | Yes | - | No | No | - | - | Yes | - | +| QWen2(.5) | 7B | LLM | Yes | Yes | Yes | Yes | Yes | - | Yes | Yes | +| QWen2-VL | 2B, 7B | MLLM | Yes | Yes | - | - | - | - | Yes | No | +| QWen2.5-VL | 3B - 72B | MLLM | Yes | Yes | - | - | Yes | - | Yes | No | +| QWen2-MoE | A14.57B | LLM | Yes | - | No | No | - | - | Yes | - | +| QWen3 | 0.6B-235B | LLM | Yes | Yes | No | No | Yes | Yes | Yes | Yes | +| DeepSeek-V2 | 16B | LLM | No | Yes | No | No | - | - | - | - | +| InternVL(v1.5) | 2B-26B | MLLM | Yes | - | Yes | Yes | - | - | Yes | - | +| InternVL2 | 1B-40B | MLLM | Yes | Yes | Yes | Yes | Yes | - | Yes | Yes | +| InternVL2.5 | 1B-78B | MLLM | Yes | Yes | Yes | Yes | Yes | - | Yes | Yes | +| InternVL3 | 1B-78B | MLLM | Yes | Yes | Yes | Yes | Yes | - | Yes | Yes | +| CogVLM2-chat | 19B | MLLM | Yes | No | - | - | - | - | Yes | - | +| GLM4V | 9B | MLLM | Yes | No | - | - | - | - | - | - | diff --git a/docs/zh_cn/get_started/ascend/get_started.md b/docs/zh_cn/get_started/ascend/get_started.md index e076e09fe5..bae1503470 100644 --- a/docs/zh_cn/get_started/ascend/get_started.md +++ b/docs/zh_cn/get_started/ascend/get_started.md @@ -1,69 +1,28 @@ -# 华为昇腾(Atlas 800T A2 & Atlas 300I Duo) +# 华为昇腾 -我们基于 LMDeploy 的 PytorchEngine,增加了华为昇腾设备的支持。所以,在华为昇腾上使用 LDMeploy 的方法与在英伟达 GPU 上使用 PytorchEngine 后端的方法几乎相同。在阅读本教程之前,请先阅读原版的[快速开始](../get_started.md)。 +我们基于 LMDeploy 的 PytorchEngine,增加了华为昇腾设备的支持,目前支持的型号是**Atlas 800T A3,Atlas 800T A2和Atlas 300I Duo**。在华为昇腾上使用 LMDeploy 的方法与在英伟达 GPU 上使用 PytorchEngine 后端的方法几乎相同。在阅读本教程之前,请先阅读原版的[快速开始](../get_started.md)。 -支持的模型列表在[这里](../../supported_models/supported_models.md#PyTorchEngine-华为昇腾平台). +支持的模型列表在[这里](../../supported_models/supported_models.md#PyTorchEngine-其他平台). > \[!IMPORTANT\] -> 我们已经在阿里云上提供了构建完成的鲲鹏CPU版本的镜像(从lmdeploy 0.7.1 + dlinfer 0.1.6开始)。 +> 我们已经在阿里云上提供了构建完成的鲲鹏CPU版本的镜像。 > 请使用下面的命令来拉取镜像: +> +> Atlas 800T A3: +> `docker pull crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:a3-latest` +> (Atlas 800T A3目前只支持Qwen系列的算子模式下运行) +> > Atlas 800T A2: -> `docker pull crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:910b-latest` +> `docker pull crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:a2-latest` +> > Atlas 300I Duo: -> `docker pull crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:310p-latest` -> 下述的dockerfile依然是可以执行的,您可以直接拉取镜像,也可以使用dockerfile来自己构建。 - -## 安装 - -我们强烈建议用户构建一个 Docker 镜像以简化环境设置。 - -克隆 lmdeploy 的源代码,Dockerfile 位于 docker 目录中。 - -```shell -git clone https://github.com/InternLM/lmdeploy.git -cd lmdeploy -``` - -### 环境准备 - -Docker 版本应不低于 18.09。并且需按照[官方指南](https://www.hiascend.com/document/detail/zh/mindx-dl/600/clusterscheduling/clusterschedulingig/clusterschedulingig/dlug_installation_012.html)安装 Ascend Docker Runtime。 - -> \[!CAUTION\] -> 如果在后续容器内出现`libascend_hal.so: cannot open shared object file`错误,说明Ascend Docker Runtime没有被正确安装。 - -#### Drivers,Firmware 和 CANN - -目标机器需安装华为驱动程序和固件版本至少为 23.0.3,请参考 -[CANN 驱动程序和固件安装](https://www.hiascend.com/document/detail/zh/CANNCommunityEdition/800alpha001/softwareinst/instg/instg_0005.html) -和[下载资源](https://www.hiascend.com/hardware/firmware-drivers/community?product=4&model=26&cann=8.0.RC2.beta1&driver=1.0.25.alpha)。 - -另外,`docker/Dockerfile_aarch64_ascend`没有提供CANN 安装包,用户需要自己从[昇腾资源下载中心](https://www.hiascend.com/developer/download/community/result?module=cann&cann=8.0.RC2.beta1&product=4&model=26)下载CANN(version 8.0.RC2.beta1)软件包。 -并将`Ascend-cann-kernels-910b*.run`,`Ascend-cann-nnal_*.run`和`Ascend-cann-toolkit*.run` 放在 lmdeploy 源码根目录下。 - -#### 构建镜像 - -请在 lmdeploy源 代码根目录下执行以下镜像构建命令,CANN 相关的安装包也放在此目录下。 - -```bash -DOCKER_BUILDKIT=1 docker build -t lmdeploy-aarch64-ascend:latest \ -    -f docker/Dockerfile_aarch64_ascend . -``` - -上述`Dockerfile_aarch64_ascend`适用于鲲鹏CPU. 如果是Intel CPU的机器,请尝试[这个dockerfile](https://github.com/InternLM/lmdeploy/issues/2745#issuecomment-2473285703) (未经过测试) - -如果以下命令执行没有任何错误,这表明环境设置成功。 - -```bash -docker run -e ASCEND_VISIBLE_DEVICES=0 --rm --name lmdeploy -t lmdeploy-aarch64-ascend:latest lmdeploy check_env -``` - -关于在昇腾设备上运行`docker run`命令的详情,请参考这篇[文档](https://www.hiascend.com/document/detail/zh/mindx-dl/600/clusterscheduling/dockerruntimeug/dlruntime_ug_013.html)。 +> `docker pull crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:300i-duo-latest` +> (Atlas 300I Duo目前只支持非eager模式) +> +> 如果您希望自己构建环境,请参考[这里](../../../../docker)的dockerfile来自己构建。 ## 离线批处理 -> \[!TIP\] -> 图模式已经支持了Atlas 800T A2。用户可以设定`eager_mode=False`来开启图模式,或者设定`eager_mode=True`来关闭图模式。(启动图模式需要事先source `/usr/local/Ascend/nnal/atb/set_env.sh`) - ### LLM 推理 将`device_type="ascend"`加入`PytorchEngineConfig`的参数中。 @@ -71,12 +30,11 @@ docker run -e ASCEND_VISIBLE_DEVICES=0 --rm --name lmdeploy -t lmdeploy-aarch64- ```python from lmdeploy import pipeline from lmdeploy import PytorchEngineConfig -if __name__ == "__main__": -    pipe = pipeline("internlm/internlm2_5-7b-chat", -     backend_config=PytorchEngineConfig(tp=1, device_type="ascend", eager_mode=True)) -    question = ["Shanghai is", "Please introduce China", "How are you?"] -    response = pipe(question) -    print(response) +pipe = pipeline("internlm/internlm2_5-7b-chat", + backend_config=PytorchEngineConfig(tp=1, device_type="ascend")) +question = ["Shanghai is", "Please introduce China", "How are you?"] +response = pipe(question) +print(response) ``` ### VLM 推理 @@ -86,33 +44,28 @@ if __name__ == "__main__": ```python from lmdeploy import pipeline, PytorchEngineConfig from lmdeploy.vl import load_image -if __name__ == "__main__": -    pipe = pipeline('OpenGVLab/InternVL2-2B', - backend_config=PytorchEngineConfig(tp=1, device_type='ascend', eager_mode=True)) -    image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') -    response = pipe(('describe this image', image)) -    print(response) +pipe = pipeline('OpenGVLab/InternVL2-2B', + backend_config=PytorchEngineConfig(tp=1, device_type='ascend')) +image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') +response = pipe(('describe this image', image)) +print(response) ``` ## 在线服务 -> \[!TIP\] -> 图模式已经支持Atlas 800T A2。 -> 在线服务时,图模式默认开启,用户可以添加`--eager-mode`来关闭图模式。(启动图模式需要事先source `/usr/local/Ascend/nnal/atb/set_env.sh`) - ### LLM 模型服务 将`--device ascend`加入到服务启动命令中。 ```bash -lmdeploy serve api_server --backend pytorch --device ascend --eager-mode internlm/internlm2_5-7b-chat +lmdeploy serve api_server --backend pytorch --device ascend internlm/internlm2_5-7b-chat ``` 也可以运行以下命令启动容器运行LLM模型服务。 ```bash -docker exec -it --net=host crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:latest \ -    bash -i -c "lmdeploy serve api_server --backend pytorch --device ascend --eager-mode internlm/internlm2_5-7b-chat" +docker run -it --net=host crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:a2-latest \ +    bash -i -c "lmdeploy serve api_server --backend pytorch --device ascend internlm/internlm2_5-7b-chat" ``` ### VLM 模型服务 @@ -120,14 +73,14 @@ docker exec -it --net=host crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyunc 将`--device ascend`加入到服务启动命令中。 ```bash -lmdeploy serve api_server --backend pytorch --device ascend --eager-mode OpenGVLab/InternVL2-2B +lmdeploy serve api_server --backend pytorch --device ascend OpenGVLab/InternVL2-2B ``` 也可以运行以下命令启动容器运行VLM模型服务。 ```bash -docker exec -it --net=host crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:latest \ -    bash -i -c "lmdeploy serve api_server --backend pytorch --device ascend --eager-mode OpenGVLab/InternVL2-2B" +docker run -it --net=host crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:a2-latest \ +    bash -i -c "lmdeploy serve api_server --backend pytorch --device ascend OpenGVLab/InternVL2-2B" ``` ## 使用命令行与LLM模型对话 @@ -135,14 +88,14 @@ docker exec -it --net=host crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyunc 将`--device ascend`加入到服务启动命令中。 ```bash -lmdeploy chat internlm/internlm2_5-7b-chat --backend pytorch --device ascend --eager-mode +lmdeploy chat internlm/internlm2_5-7b-chat --backend pytorch --device ascend ``` 也可以运行以下命令使启动容器后开启lmdeploy聊天 ```bash -docker exec -it crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:latest \ -    bash -i -c "lmdeploy chat --backend pytorch --device ascend --eager-mode internlm/internlm2_5-7b-chat" +docker run -it crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/ascend:a2-latest \ +    bash -i -c "lmdeploy chat --backend pytorch --device ascend internlm/internlm2_5-7b-chat" ``` ## 量化 diff --git a/docs/zh_cn/get_started/camb/get_started.md b/docs/zh_cn/get_started/camb/get_started.md new file mode 100644 index 0000000000..4f3043ccce --- /dev/null +++ b/docs/zh_cn/get_started/camb/get_started.md @@ -0,0 +1,96 @@ +# 寒武纪云端加速卡 + +我们基于 LMDeploy 的 PytorchEngine,增加了寒武纪云端加速卡设备的支持。所以,在寒武纪云端加速卡上使用 LMDeploy 的方法与在英伟达 GPU 上使用 PytorchEngine 后端的方法几乎相同。在阅读本教程之前,请先阅读原版的[快速开始](../get_started.md)。 + +支持的模型列表在[这里](../../supported_models/supported_models.md#PyTorchEngine-其他平台). + +> \[!IMPORTANT\] +> 我们已经在阿里云上提供了构建完成的寒武纪云端加速卡镜像。 +> 请使用下面的命令来拉取镜像: +> +> `docker pull crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/camb:latest` + +> \[!IMPORTANT\] +> 目前寒武纪加速卡上启动多卡推理需要手动启动ray。下面是一个2卡的例子: +> +> ```shell +> export MLU_VISIBLE_DEVICES=0,1 +> ray start --head --resources='{"MLU": 2}' +> ``` + +## 离线批处理 + +### LLM 推理 + +将`device_type="camb"`加入`PytorchEngineConfig`的参数中。 + +```python +from lmdeploy import pipeline +from lmdeploy import PytorchEngineConfig +pipe = pipeline("internlm/internlm2_5-7b-chat", + backend_config=PytorchEngineConfig(tp=1, device_type="camb")) +question = ["Shanghai is", "Please introduce China", "How are you?"] +response = pipe(question) +print(response) +``` + +### VLM 推理 + +将`device_type="camb"`加入`PytorchEngineConfig`的参数中。 + +```python +from lmdeploy import pipeline, PytorchEngineConfig +from lmdeploy.vl import load_image +pipe = pipeline('OpenGVLab/InternVL2-2B', + backend_config=PytorchEngineConfig(tp=1, device_type='camb')) +image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') +response = pipe(('describe this image', image)) +print(response) +``` + +## 在线服务 + +### LLM 模型服务 + +将`--device camb`加入到服务启动命令中。 + +```bash +lmdeploy serve api_server --backend pytorch --device camb internlm/internlm2_5-7b-chat +``` + +也可以运行以下命令启动容器运行LLM模型服务。 + +```bash +docker run -it --net=host crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/camb:latest \ +    bash -i -c "lmdeploy serve api_server --backend pytorch --device camb internlm/internlm2_5-7b-chat" +``` + +### VLM 模型服务 + +将`--device camb`加入到服务启动命令中。 + +```bash +lmdeploy serve api_server --backend pytorch --device camb OpenGVLab/InternVL2-2B +``` + +也可以运行以下命令启动容器运行VLM模型服务。 + +```bash +docker run -it --net=host crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/camb:latest \ +    bash -i -c "lmdeploy serve api_server --backend pytorch --device camb OpenGVLab/InternVL2-2B" +``` + +## 使用命令行与LLM模型对话 + +将`--device camb`加入到服务启动命令中。 + +```bash +lmdeploy chat internlm/internlm2_5-7b-chat --backend pytorch --device camb +``` + +也可以运行以下命令使启动容器后开启lmdeploy聊天 + +```bash +docker run -it crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/camb:latest \ +    bash -i -c "lmdeploy chat --backend pytorch --device camb internlm/internlm2_5-7b-chat" +``` diff --git a/docs/zh_cn/get_started/index.rst b/docs/zh_cn/get_started/index.rst index 35affc13ce..2a84b00684 100644 --- a/docs/zh_cn/get_started/index.rst +++ b/docs/zh_cn/get_started/index.rst @@ -3,6 +3,8 @@ .. toctree:: :maxdepth: 1 - :caption: NPU(Huawei) + :caption: OtherPF ascend/get_started.md + maca/get_started.md + camb/get_started.md diff --git a/docs/zh_cn/get_started/maca/get_started.md b/docs/zh_cn/get_started/maca/get_started.md new file mode 100644 index 0000000000..bbe57caf7f --- /dev/null +++ b/docs/zh_cn/get_started/maca/get_started.md @@ -0,0 +1,87 @@ +# 沐曦C500 + +我们基于 LMDeploy 的 PytorchEngine,增加了沐曦C500设备的支持。所以,在沐曦上使用 LMDeploy 的方法与在英伟达 GPU 上使用 PytorchEngine 后端的方法几乎相同。在阅读本教程之前,请先阅读原版的[快速开始](../get_started.md)。 + +支持的模型列表在[这里](../../supported_models/supported_models.md#PyTorchEngine-其他平台). + +> \[!IMPORTANT\] +> 我们已经在阿里云上提供了构建完成的沐曦的镜像。 +> 请使用下面的命令来拉取镜像: +> `docker pull crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/maca:latest` + +## 离线批处理 + +### LLM 推理 + +将`device_type="maca"`加入`PytorchEngineConfig`的参数中。 + +```python +from lmdeploy import pipeline +from lmdeploy import PytorchEngineConfig +pipe = pipeline("internlm/internlm2_5-7b-chat", + backend_config=PytorchEngineConfig(tp=1, device_type="maca")) +question = ["Shanghai is", "Please introduce China", "How are you?"] +response = pipe(question) +print(response) +``` + +### VLM 推理 + +将`device_type="maca"`加入`PytorchEngineConfig`的参数中。 + +```python +from lmdeploy import pipeline, PytorchEngineConfig +from lmdeploy.vl import load_image +pipe = pipeline('OpenGVLab/InternVL2-2B', + backend_config=PytorchEngineConfig(tp=1, device_type='maca')) +image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg') +response = pipe(('describe this image', image)) +print(response) +``` + +## 在线服务 + +### LLM 模型服务 + +将`--device maca`加入到服务启动命令中。 + +```bash +lmdeploy serve api_server --backend pytorch --device maca internlm/internlm2_5-7b-chat +``` + +也可以运行以下命令启动容器运行LLM模型服务。 + +```bash +docker run -it --net=host crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/maca:latest \ +    bash -i -c "lmdeploy serve api_server --backend pytorch --device maca internlm/internlm2_5-7b-chat" +``` + +### VLM 模型服务 + +将`--device maca`加入到服务启动命令中。 + +```bash +lmdeploy serve api_server --backend pytorch --device maca OpenGVLab/InternVL2-2B +``` + +也可以运行以下命令启动容器运行VLM模型服务。 + +```bash +docker run -it --net=host crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/maca:latest \ +    bash -i -c "lmdeploy serve api_server --backend pytorch --device maca OpenGVLab/InternVL2-2B" +``` + +## 使用命令行与LLM模型对话 + +将`--device maca`加入到服务启动命令中。 + +```bash +lmdeploy chat internlm/internlm2_5-7b-chat --backend pytorch --device maca +``` + +也可以运行以下命令使启动容器后开启lmdeploy聊天 + +```bash +docker run -it crpi-4crprmm5baj1v8iv.cn-hangzhou.personal.cr.aliyuncs.com/lmdeploy_dlinfer/maca:latest \ +    bash -i -c "lmdeploy chat --backend pytorch --device maca internlm/internlm2_5-7b-chat" +``` diff --git a/docs/zh_cn/supported_models/supported_models.md b/docs/zh_cn/supported_models/supported_models.md index 4d9998f2df..aaebb1df1a 100644 --- a/docs/zh_cn/supported_models/supported_models.md +++ b/docs/zh_cn/supported_models/supported_models.md @@ -123,28 +123,28 @@ * [2] 自 0.6.4 之后,PyTorch 引擎移除了对 llava 模型原始格式的支持。我们建议使用它们对应的 transformers 格式的模型。这些模型可以在 https://huggingface.co/llava-hf 中找到 ``` -## PyTorchEngine 华为昇腾平台 +## PyTorchEngine 其他平台 -| | | | Atlas 800T A2 | Atlas 800T A2 | Atlas 800T A2 | Atlas 800T A2 | Atlas 300I Duo | -| :------------: | :-------: | :--: | :--------------: | :--------------: | :-----------: | :-----------: | :------------: | -| Model | Size | Type | FP16/BF16(eager) | FP16/BF16(graph) | W8A8(graph) | W4A16(eager) | FP16(graph) | -| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes | - | -| Llama3 | 8B | LLM | Yes | Yes | Yes | Yes | Yes | -| Llama3.1 | 8B | LLM | Yes | Yes | Yes | Yes | Yes | -| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes | -| InternLM2.5 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes | -| InternLM3 | 8B | LLM | Yes | Yes | Yes | Yes | Yes | -| Mixtral | 8x7B | LLM | Yes | Yes | No | No | Yes | -| QWen1.5-MoE | A2.7B | LLM | Yes | - | No | No | - | -| QWen2(.5) | 7B | LLM | Yes | Yes | Yes | Yes | Yes | -| QWen2-VL | 2B, 7B | MLLM | Yes | Yes | - | - | - | -| QWen2.5-VL | 3B - 72B | MLLM | Yes | Yes | - | - | Yes | -| QWen2-MoE | A14.57B | LLM | Yes | - | No | No | - | -| QWen3 | 0.6B-235B | LLM | Yes | Yes | No | No | Yes | -| DeepSeek-V2 | 16B | LLM | No | Yes | No | No | - | -| InternVL(v1.5) | 2B-26B | MLLM | Yes | - | Yes | Yes | - | -| InternVL2 | 1B-40B | MLLM | Yes | Yes | Yes | Yes | Yes | -| InternVL2.5 | 1B-78B | MLLM | Yes | Yes | Yes | Yes | Yes | -| InternVL3 | 1B-78B | MLLM | Yes | Yes | Yes | Yes | Yes | -| CogVLM2-chat | 19B | MLLM | Yes | No | - | - | - | -| GLM4V | 9B | MLLM | Yes | No | - | - | - | +| | | | Atlas 800T A2 | Atlas 800T A2 | Atlas 800T A2 | Atlas 800T A2 | Atlas 300I Duo | Atlas 800T A3 | Maca C500 | Cambricon | +| :------------: | :-------: | :--: | :--------------: | :--------------: | :-----------: | :-----------: | :------------: | :--------------: | :-------: | :-------: | +| Model | Size | Type | FP16/BF16(eager) | FP16/BF16(graph) | W8A8(graph) | W4A16(eager) | FP16(graph) | FP16/BF16(eager) | BF/FP16 | BF/FP16 | +| Llama2 | 7B - 70B | LLM | Yes | Yes | Yes | Yes | - | Yes | Yes | Yes | +| Llama3 | 8B | LLM | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | +| Llama3.1 | 8B | LLM | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | +| InternLM2 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | +| InternLM2.5 | 7B - 20B | LLM | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | +| InternLM3 | 8B | LLM | Yes | Yes | Yes | Yes | Yes | Yes | Yes | Yes | +| Mixtral | 8x7B | LLM | Yes | Yes | No | No | Yes | - | Yes | Yes | +| QWen1.5-MoE | A2.7B | LLM | Yes | - | No | No | - | - | Yes | - | +| QWen2(.5) | 7B | LLM | Yes | Yes | Yes | Yes | Yes | - | Yes | Yes | +| QWen2-VL | 2B, 7B | MLLM | Yes | Yes | - | - | - | - | Yes | No | +| QWen2.5-VL | 3B - 72B | MLLM | Yes | Yes | - | - | Yes | - | Yes | No | +| QWen2-MoE | A14.57B | LLM | Yes | - | No | No | - | - | Yes | - | +| QWen3 | 0.6B-235B | LLM | Yes | Yes | No | No | Yes | Yes | Yes | Yes | +| DeepSeek-V2 | 16B | LLM | No | Yes | No | No | - | - | - | - | +| InternVL(v1.5) | 2B-26B | MLLM | Yes | - | Yes | Yes | - | - | Yes | - | +| InternVL2 | 1B-40B | MLLM | Yes | Yes | Yes | Yes | Yes | - | Yes | Yes | +| InternVL2.5 | 1B-78B | MLLM | Yes | Yes | Yes | Yes | Yes | - | Yes | Yes | +| InternVL3 | 1B-78B | MLLM | Yes | Yes | Yes | Yes | Yes | - | Yes | Yes | +| CogVLM2-chat | 19B | MLLM | Yes | No | - | - | - | - | Yes | - | +| GLM4V | 9B | MLLM | Yes | No | - | - | - | - | - | - |