Skip to content

Conversation

@Alchuang22-dev
Copy link
Contributor

Description

This PR refactors the AINode client infrastructure to support direct communication between DataNode and AINode, removing the dependency on ConfigNode for AI-related operations such as model loading and inference. Should be reviewed by @CRZbulabula

Contents

AINodeClient

  • Added a new executeRemoteCallWithRetry() method for automatic retry and reconnection on Thrift transport failures, following the same design pattern as ConfigNodeClient.

  • Updated the loadModel(TLoadModelReq req) API to use this retry wrapper for improved resilience.

  • Simplified connection lifecycle management (init(), close()) to ensure stable client reuse via AINodeClientManager.

ClusterConfigTaskExecutor

  • Replaced indirect ConfigNode RPCs with direct calls to AINodeClientManager.borrowClient(TEndPoint) for model operations (currently loadModel as an example).

  • Ensured the DataNode→AINode invocation flow mirrors the ConfigNode client style while maintaining compatibility with existing client pooling.

  • Updated Thrift imports to use org.apache.iotdb.ainode.rpc.thrift.* instead of org.apache.iotdb.confignode.rpc.thrift.*.

AINodeClientManager

  • No functional changes; reused existing pool management for TEndPoint-based clients to keep consistency with ConfigNodeClientManager.

Impact

DataNode can now directly send AI-related requests (e.g., model load/unload, inference) to AINode without routing through ConfigNode.

Next Steps

Extend the same direct invocation pattern (AINodeClientManager.borrowClient()) to other AI APIs:
unloadModel, showModel, showLoadedModel, showAIDevices, createTraining, and getModelInfo.


This PR has:

  • been self-reviewed.
    • concurrent read
    • concurrent write
    • concurrent read and write
  • added documentation for new or modified features or behaviors.
  • added Javadocs for most classes and all non-trivial methods.
  • added or updated version, license, or notice information
  • added comments explaining the "why" and the intent of the code wherever would not be obvious
    for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold
    for code coverage.
  • added integration tests.
  • been tested in a test IoTDB cluster.

Key changed/added classes (or packages if there are too many classes) in this PR

As former.

Copy link
Contributor

@CRZbulabula CRZbulabula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL.

Copy link
Contributor

@CRZbulabula CRZbulabula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL.

Copy link
Contributor

@CRZbulabula CRZbulabula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL.

Copy link
Contributor

@CRZbulabula CRZbulabula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL.

Copy link
Contributor

@CRZbulabula CRZbulabula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL.

@RkGrit
Copy link
Contributor

RkGrit commented Nov 17, 2025

LGTM~

Copy link
Contributor

@CRZbulabula CRZbulabula left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!!!

@CRZbulabula CRZbulabula merged commit d49d7dd into apache:master Nov 17, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants