Skip to content

Conversation

@ascend-direct-dev
Copy link
Contributor

Description

Type of Change

  • Types
    • Bug fix
    • New feature
      • Transfer Engine
      • Mooncake Store
      • Mooncake EP
      • Integration
      • P2P Store
      • Python Wheel
    • Breaking change
    • CI/CD
    • Documentation update
    • Other

How Has This Been Tested?

Checklist

  • I have performed a self-review of my own code.
  • I have updated the documentation.
  • I have added tests to prove my changes are effective.

@ascend-direct-dev ascend-direct-dev marked this pull request as draft November 25, 2025 09:16
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ascend-direct-dev, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request adds a new example to demonstrate the "mooncake.engine.TransferEngine"'s capabilities for distributed memory management on Ascend NPUs. It includes a Python script ("te.py") that sets up a two-process distributed environment, registers NPU memory, and performs a synchronous batch transfer of tensor data. A companion shell script ("run.sh") is also added to streamline the environment configuration and execution of this example.

Highlights

  • New TransferEngine Example: Introduces "te.py", a new Python script demonstrating the "mooncake.engine.TransferEngine" for peer-to-peer memory transfer on Ascend NPUs.
  • NPU Environment Setup: Adds "run.sh" to configure necessary Ascend NPU environment variables and execute the "te.py" example script.
  • Distributed Memory Operations: The example showcases memory registration, P2P handshake, and synchronous batch transfer of tensor data between two processes using "torch_npu".
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds an example shell script and a Python script to demonstrate the usage of the transfer engine. The changes are a good starting point, but there are several areas for improvement in terms of robustness, efficiency, and adherence to best practices.

In run.sh, I've pointed out a dangerous use of a hardcoded absolute path and suggested adding standard shell script boilerplate for safety.

In te.py, I've identified several issues including redundant code, use of magic numbers, an inefficient busy-wait loop, and lack of standard Python practices like argument validation and using a main guard. My comments include suggestions to make the script more robust, efficient, and maintainable.

break
print(f"remote port:{remote_port}")

import torch.distributed as dist
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Imports should be at the top of the file, as per the PEP 8 style guide. import torch.distributed as dist (and import time on line 48, which is also a duplicate) should be moved to the top with the other imports. This improves code readability and organization.

@ascend-direct-dev
Copy link
Contributor Author

./mooncake-transfer-engine/example/transfer_engine_ascend_direct_perf --metadata_server=P2PHANDSHAKE --local_server_name=127.0.0.1:12345 --operation=write --device_logicid=0 --batch_size=200 --block_iteration=6 --mode=target
第一个进程运行后,终端会打印listening on 127.0.0.1:15454之类的,由于mooncake自动获取端口,所以15454是会自动变的,第二个进程的segment_id填写第一个进程打印出来的端口

./mooncake-transfer-engine/example/transfer_engine_ascend_direct_perf --metadata_server=P2PHANDSHAKE --local_server_name=127.0.0.1:12346 --operation=write --device_logicid=1 --batch_size=200 --block_iteration=6 --mode=initiator --segment_id=127.0.0.1:15454

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant