[TransferEngine]feat: add tensor transfer Read/Write API for transfer-engine #703

Risc-lt · 2025-07-31T18:50:01Z

Add tensor-specific transfering handler to wrap up lower level executions of serialization and deserialization for pytorch tensor, memory registration and subtlr migration.

Plz review this. Thx. @stmatengss

staryxchen · 2025-08-01T06:58:08Z

Hi @Risc-lt
I wonder what the benefits of this PR are?

Risc-lt · 2025-08-01T12:50:05Z

@staryxchen Thx for response. We are currently working on experiments of object-level abstract for subtle migration. This is a draft to be improved.

stmatengss · 2025-08-02T06:54:44Z

Hi @Risc-lt

I wonder what the benefits of this PR are?

Hi, if you're interested. Feel free to join us. @staryxchen

staryxchen · 2025-08-03T05:49:17Z

Hi @Risc-lt
I wonder what the benefits of this PR are?

Hi, if you're interested. Feel free to join us. @staryxchen

Okay. I will keep a close eye on this PR.

Copilot

Pull Request Overview

This PR adds tensor-specific transfer functionality to the transfer engine, providing high-level APIs for PyTorch tensor serialization and deserialization over the network. The implementation wraps lower-level transfer operations with automatic metadata handling and tensor reconstruction.

Key changes include:

New tensor transfer APIs (transfer_tensor_sync_write and transfer_tensor_sync_read) for PyTorch tensors
Automatic tensor metadata serialization including dtype, dimensions, and shape information
Support for multiple tensor data types (float32, int32, bool, etc.) with up to 4 dimensions
Comprehensive test suite covering various tensor types and scenarios

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File	Description
test_transfer_tensor.py	Comprehensive test suite for tensor transfer functionality
transfer_engine_py.h	Header definitions for tensor transfer APIs and metadata structures
transfer_engine_py.cpp	Implementation of tensor serialization/deserialization logic

Copilot · 2025-08-08T09:54:00Z

mooncake-integration/transfer_engine/transfer_engine_py.cpp


 #include <pybind11/stl.h>

+auto torch = py::module_::import("torch");


The global torch module import at file scope could cause issues if torch is not available when the module is loaded. Consider importing torch lazily within functions that need it, with proper error handling for cases where torch is not installed.

Suggested change

auto torch = py::module_::import("torch");

// Lazy import torch within functions that need it, with error handling.

static py::object import_torch() {

try {

return py::module_::import("torch");

} catch (const py::error_already_set &e) {

throw std::runtime_error("Failed to import torch Python module. Is torch installed? Error: " + std::string(e.what()));

}

}

Copilot · 2025-08-08T09:54:01Z

mooncake-integration/transfer_engine/transfer_engine_py.cpp

+        return TransferEnginePy{}.create_typed_array<uint64_t>(data, offset, total_length); 
+    },  // UINT64 = 9
+    [](char* data, size_t offset, size_t total_length) { 
+        return TransferEnginePy{}.create_typed_array<bool>(data, offset, total_length); 


Each lambda creates a temporary TransferEnginePy object to call create_typed_array. This is inefficient and unnecessary since create_typed_array could be made static or the lambdas could directly implement the array creation logic.

Suggested change

return TransferEnginePy{}.create_typed_array<bool>(data, offset, total_length);

return TransferEnginePy::create_typed_array<float>(data, offset, total_length);

}, // FLOAT32 = 0

[](char* data, size_t offset, size_t total_length) {

return TransferEnginePy::create_typed_array<double>(data, offset, total_length);

}, // FLOAT64 = 1

[](char* data, size_t offset, size_t total_length) {

return TransferEnginePy::create_typed_array<int8_t>(data, offset, total_length);

}, // INT8 = 2

[](char* data, size_t offset, size_t total_length) {

return TransferEnginePy::create_typed_array<uint8_t>(data, offset, total_length);

}, // UINT8 = 3

[](char* data, size_t offset, size_t total_length) {

return TransferEnginePy::create_typed_array<int16_t>(data, offset, total_length);

}, // INT16 = 4

[](char* data, size_t offset, size_t total_length) {

return TransferEnginePy::create_typed_array<uint16_t>(data, offset, total_length);

}, // UINT16 = 5

[](char* data, size_t offset, size_t total_length) {

return TransferEnginePy::create_typed_array<int32_t>(data, offset, total_length);

}, // INT32 = 6

[](char* data, size_t offset, size_t total_length) {

return TransferEnginePy::create_typed_array<uint32_t>(data, offset, total_length);

}, // UINT32 = 7

[](char* data, size_t offset, size_t total_length) {

return TransferEnginePy::create_typed_array<int64_t>(data, offset, total_length);

}, // INT64 = 8

[](char* data, size_t offset, size_t total_length) {

return TransferEnginePy::create_typed_array<uint64_t>(data, offset, total_length);

}, // UINT64 = 9

[](char* data, size_t offset, size_t total_length) {

return TransferEnginePy::create_typed_array<bool>(data, offset, total_length);

Copilot · 2025-08-08T09:54:01Z

mooncake-integration/transfer_engine/transfer_engine_py.h

+    int32_t dtype;
+    int32_t ndim;
+    int32_t shape[4];
+} __attribute__((packed));


The fixed-size shape array limits tensors to 4 dimensions. Consider using a more flexible approach or document this limitation clearly, as PyTorch tensors can have more than 4 dimensions in practice.

Suggested change

} __attribute__((packed));

std::vector<int32_t> shape;

};

Copilot · 2025-08-08T09:54:02Z

mooncake-wheel/tests/test_transfer_tensor.py

+        tensor = torch.tensor([1.0, 2.0, 3.0, 4.0], dtype=torch.float32)
+
+        # Calculate total size (metadata + tensor data)
+        metadata_size = 24  # sizeof(TensorMetadata) = 4 * 4 bytes


The hardcoded metadata size of 24 bytes is fragile and could break if the TensorMetadata structure changes. Consider calculating this dynamically or defining it as a constant that can be imported from the C++ side.

Suggested change

metadata_size = 24 # sizeof(TensorMetadata) = 4 * 4 bytes

metadata_size = ctypes.sizeof(TensorMetadata) # Dynamically calculated

Copilot · 2025-08-08T09:54:02Z

mooncake-integration/transfer_engine/transfer_engine_py.cpp

+            if (i < ndim) {
+                metadata.shape[i] = shape_tuple[i].cast<int32_t>();
+            } else {
+                metadata.shape[i] = -1;


[nitpick] Using -1 as a sentinel value for unused shape dimensions is unclear. Consider using 0 or defining a named constant to make the intent more explicit.

Suggested change

metadata.shape[i] = -1;

metadata.shape[i] = UNUSED_DIMENSION;

Copilot · 2025-08-08T09:54:02Z

mooncake-integration/transfer_engine/transfer_engine_py.cpp

+            for (int i = 0; i < metadata.ndim; i++) {
+                if (metadata.shape[i] > 0) {  // Only add valid dimensions
+                    shape_vec.push_back(metadata.shape[i]);
+                }


The condition metadata.shape[i] > 0 excludes dimensions with size 0, but zero-sized dimensions are valid in PyTorch tensors. This could cause incorrect tensor reconstruction for tensors with empty dimensions.

Suggested change

}

shape_vec.push_back(metadata.shape[i]);

Risc-lt added 2 commits July 30, 2025 15:35

feat: create tensor specified transfer for engine

571d6c6

feat: add tests and fix mismatched bytes error

adf24ae

Risc-lt changed the title ~~Feat: Add tensor transfer Read/Write API for transfer-engine~~ [Transfer-Engine]feat: add tensor transfer Read/Write API for transfer-engine Jul 31, 2025

Risc-lt changed the title ~~[Transfer-Engine]feat: add tensor transfer Read/Write API for transfer-engine~~ [TransferEngine]feat: add tensor transfer Read/Write API for transfer-engine Jul 31, 2025

stmatengss requested a review from Copilot August 8, 2025 09:51

Copilot AI reviewed Aug 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[TransferEngine]feat: add tensor transfer Read/Write API for transfer-engine #703

[TransferEngine]feat: add tensor transfer Read/Write API for transfer-engine #703

Uh oh!

Risc-lt commented Jul 31, 2025

Uh oh!

staryxchen commented Aug 1, 2025

Uh oh!

Risc-lt commented Aug 1, 2025

Uh oh!

stmatengss commented Aug 2, 2025

Uh oh!

staryxchen commented Aug 3, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Aug 8, 2025

Uh oh!

Copilot AI Aug 8, 2025

Uh oh!

Copilot AI Aug 8, 2025

Uh oh!

Copilot AI Aug 8, 2025

Uh oh!

Copilot AI Aug 8, 2025

Uh oh!

Copilot AI Aug 8, 2025

Uh oh!

Uh oh!


		#include <pybind11/stl.h>

		auto torch = py::module_::import("torch");

-auto torch = py::module_::import("torch");
+// Lazy import torch within functions that need it, with error handling.
+static py::object import_torch() {
+    try {
+        return py::module_::import("torch");
+    } catch (const py::error_already_set &e) {
+        throw std::runtime_error("Failed to import torch Python module. Is torch installed? Error: " + std::string(e.what()));
+    }
+}

-        return TransferEnginePy{}.create_typed_array<bool>(data, offset, total_length);
+        return TransferEnginePy::create_typed_array<float>(data, offset, total_length);
+    },     // FLOAT32 = 0
+    [](char* data, size_t offset, size_t total_length) {
+        return TransferEnginePy::create_typed_array<double>(data, offset, total_length);
+    },    // FLOAT64 = 1
+    [](char* data, size_t offset, size_t total_length) {
+        return TransferEnginePy::create_typed_array<int8_t>(data, offset, total_length);
+    },    // INT8 = 2
+    [](char* data, size_t offset, size_t total_length) {
+        return TransferEnginePy::create_typed_array<uint8_t>(data, offset, total_length);
+    },   // UINT8 = 3
+    [](char* data, size_t offset, size_t total_length) {
+        return TransferEnginePy::create_typed_array<int16_t>(data, offset, total_length);
+    },   // INT16 = 4
+    [](char* data, size_t offset, size_t total_length) {
+        return TransferEnginePy::create_typed_array<uint16_t>(data, offset, total_length);
+    },  // UINT16 = 5
+    [](char* data, size_t offset, size_t total_length) {
+        return TransferEnginePy::create_typed_array<int32_t>(data, offset, total_length);
+    },   // INT32 = 6
+    [](char* data, size_t offset, size_t total_length) {
+        return TransferEnginePy::create_typed_array<uint32_t>(data, offset, total_length);
+    },  // UINT32 = 7
+    [](char* data, size_t offset, size_t total_length) {
+        return TransferEnginePy::create_typed_array<int64_t>(data, offset, total_length);
+    },   // INT64 = 8
+    [](char* data, size_t offset, size_t total_length) {
+        return TransferEnginePy::create_typed_array<uint64_t>(data, offset, total_length);
+    },  // UINT64 = 9
+    [](char* data, size_t offset, size_t total_length) {
+        return TransferEnginePy::create_typed_array<bool>(data, offset, total_length);

	metadata_size = 24 # sizeof(TensorMetadata) = 4 * 4 bytes
	metadata_size = ctypes.sizeof(TensorMetadata) # Dynamically calculated

	metadata.shape[i] = -1;
	metadata.shape[i] = UNUSED_DIMENSION;

[TransferEngine]feat: add tensor transfer Read/Write API for transfer-engine #703

Are you sure you want to change the base?

[TransferEngine]feat: add tensor transfer Read/Write API for transfer-engine #703

Uh oh!

Conversation

Risc-lt commented Jul 31, 2025

Uh oh!

staryxchen commented Aug 1, 2025

Uh oh!

Risc-lt commented Aug 1, 2025

Uh oh!

stmatengss commented Aug 2, 2025

Uh oh!

staryxchen commented Aug 3, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!