Skip to content

Conversation

yao-fengchen
Copy link
Collaborator

No description provided.

@windreamer
Copy link
Collaborator

Maybe we can use multistage build of docker buildx to simplify this Dockerfile, here is an example to illustrate:

ARG ASCEND_DEVICE=a3
ARG ASCEND_HUB=swr.cn-south-1.myhuaweicloud.com/ascendhub

FROM ${ASCEND_HUB}/cann:8.1.rc1-a3-openeuler22.03-py3.10 AS a3_base

FROM ${ASCEND_HUB}/cann:8.1.rc1-910b-ubuntu22.04-py3.10 AS a2_base

FROM ${ASCEND_HUB}/cann:8.1.rc1-310p-ubuntu22.04-py3.10 AS 300i_base

FROM ${ASCEND_DEVICE}_base AS builder
ENV LMDEPLOY_TARGET_DEVICE=ascend
WORKDIR /opt/lmdeploy
COPY . .
RUN --mount=type=cache,target=/root/.cache \
    pip config set global.index-url https://mirrors.aliyun.com/pypi/simple && \
    pip config set global.trusted-host mirrors.aliyun.com && \
    pip install --no-cache-dir -U pip build && \
    python -m build -w -o /wheels -v .

FROM ${BASE_IMAGE} AS final
RUN --mount=type=cache,target=/root/.cache \
    pip config set global.index-url  https://mirrors.aliyun.com/pypi/simple  && \
    pip config set global.trusted-host  mirrors.aliyun.com  && \
    pip install --no-cache-dir torch==2.3.1 torch-npu==2.3.1 torchvision==0.18.1

RUN --mount=type=cache,target=/wheels,from=builder,source=/wheels \
    pip install --no-cache-dir /wheels/*.whl
ENTRYPOINT []

@windreamer
Copy link
Collaborator

And also, we can consider to use https://hub.docker.com/r/ascendai/pytorch as base image to minimize our effort?
Not quite sure abort if the ascend device type matters.

@yao-fengchen
Copy link
Collaborator Author

Maybe we can use multistage build of docker buildx to simplify this Dockerfile, here is an example to illustrate:

ARG ASCEND_DEVICE=a3
ARG ASCEND_HUB=swr.cn-south-1.myhuaweicloud.com/ascendhub

FROM ${ASCEND_HUB}/cann:8.1.rc1-a3-openeuler22.03-py3.10 AS a3_base

FROM ${ASCEND_HUB}/cann:8.1.rc1-910b-ubuntu22.04-py3.10 AS a2_base

FROM ${ASCEND_HUB}/cann:8.1.rc1-310p-ubuntu22.04-py3.10 AS 300i_base

FROM ${ASCEND_DEVICE}_base AS builder
ENV LMDEPLOY_TARGET_DEVICE=ascend
WORKDIR /opt/lmdeploy
COPY . .
RUN --mount=type=cache,target=/root/.cache \
    pip config set global.index-url https://mirrors.aliyun.com/pypi/simple && \
    pip config set global.trusted-host mirrors.aliyun.com && \
    pip install --no-cache-dir -U pip build && \
    python -m build -w -o /wheels -v .

FROM ${BASE_IMAGE} AS final
RUN --mount=type=cache,target=/root/.cache \
    pip config set global.index-url  https://mirrors.aliyun.com/pypi/simple  && \
    pip config set global.trusted-host  mirrors.aliyun.com  && \
    pip install --no-cache-dir torch==2.3.1 torch-npu==2.3.1 torchvision==0.18.1

RUN --mount=type=cache,target=/wheels,from=builder,source=/wheels \
    pip install --no-cache-dir /wheels/*.whl
ENTRYPOINT []

ok, i will simplify this Dockerfile in the last commit

@jinminxi104
Copy link
Collaborator

jinminxi104 commented Sep 4, 2025

And also, we can consider to use https://hub.docker.com/r/ascendai/pytorch as base image to minimize our effort? Not quite sure abort if the ascend device type matters.

I think Huawei's official Ascend Hub offers a more network-friendly experience for all users.

@jinminxi104 jinminxi104 marked this pull request as ready for review September 4, 2025 15:37
pip install /wheels/*.whl
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple && \
pip config set global.trusted-host mirrors.aliyun.com && \
pip install --no-cache-dir torch==2.3.1 torch-npu==2.3.1 torchvision==0.18.1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we install requirements_ascend.txt instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our current stable torch-npu versions on A2 and A3 are different. A2 uses torch npu==2.3.1, while A3 uses torch npu==2.7.1rc1. This Dockerfile is for A2 and 300i, we temporarily write torch npu==2.3.1 here. We will ensure that A2 runs stably on high torch-npu versions in the future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can have different environment variable defined accordingly, for example:

FROM ${ASCEND_HUB}/cann:8.1.rc1-910b-ubuntu22.04-py3.10 AS ascend_a2_base
ENV TORCH_VERSION=2.3.1

FROM ${ASCEND_HUB}/cann:8.1.rc1-XXX-ubuntu22.04-py3.10 AS ascend_a2_base
ENV TORCH_VERSION=SOME_VERSION_ELSE

......

RUN pip install --no-cache-dir torch==${TORCH_VERSION} torch-npu==${TORCH_VERSION}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can have different environment variable defined accordingly, for example:

FROM ${ASCEND_HUB}/cann:8.1.rc1-910b-ubuntu22.04-py3.10 AS ascend_a2_base
ENV TORCH_VERSION=2.3.1

FROM ${ASCEND_HUB}/cann:8.1.rc1-XXX-ubuntu22.04-py3.10 AS ascend_a2_base
ENV TORCH_VERSION=SOME_VERSION_ELSE

......

RUN pip install --no-cache-dir torch==${TORCH_VERSION} torch-npu==${TORCH_VERSION}

This Dockerfile is for A2 and 300i, A2 and 300i are the same except for base_image. Due to some internal problems, we only provide images for A3 for download for the time being.

@lvhan028 lvhan028 requested review from windreamer and removed request for grimoire September 5, 2025 09:46
@lvhan028 lvhan028 merged commit c62a442 into InternLM:main Sep 8, 2025
19 of 20 checks passed
littlegy pushed a commit to littlegy/lmdeploy that referenced this pull request Sep 11, 2025
* refactor ascend Dockerfile

* simplify Dockerfile

* update code

* update code

* update code

* update code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants