Skip to content

Conversation

aoxy
Copy link

@aoxy aoxy commented Sep 19, 2025

What?

This PR fixes a bug in the plugin directory resolution logic in getPluginDir(). Previously, the code attempted to locate plugins in a non-existent core/plugins directory. With certain build setups, this resulted in plugin loading failures, as reported when running ./examples/cpp/nixl_example:

$ NIXL_LOG_LEVEL=DEBUG ./examples/cpp/nixl_example
I0919 19:06:48.573057   96808 nixl_agent.cpp:122] NIXL ETCD is excluded
I0919 19:06:48.573148   96808 nixl_agent.cpp:122] NIXL ETCD is excluded
I0919 19:06:48.573169   96808 nixl_plugin_manager.cpp:197] Loading plugins from file: /home/aoxy/projs/nixl/my_build/pluginlist
I0919 19:06:48.628053   96808 nixl_plugin_manager.cpp:206] Loading plugins from: /home/aoxy/projs/nixl/my_build/examples/cpp/../../src/core/plugins
E0919 19:06:48.628149   96808 nixl_plugin_manager.cpp:297] Error accessing directory("/home/aoxy/projs/nixl/my_build/examples/cpp/../../src/core/plugins"): No such file or directory
...

Why?

The root cause of the bug is the logic for deriving the plugin directory path using std::filesystem::path::parent_path. The code snippet is:

static std::string
getPluginDir() {
    // Environment variable takes precedence
    const char *plugin_dir = getenv("NIXL_PLUGIN_DIR");
    if (plugin_dir) {
        return plugin_dir;
    }
    // By default, use the plugin directory relative to the binary
    Dl_info info;
    int ok = dladdr(reinterpret_cast<void *>(&getPluginDir), &info);
    if (!ok) {
        NIXL_ERROR << "Failed to get plugin directory from dladdr";
        return "";
    }
    return (std::filesystem::path(info.dli_fname).parent_path() / "plugins").string();
}

If info.dli_fname is /home/aoxy/projs/nixl/my_build/examples/cpp/../../src/core/libnixl.so, then std::filesystem::path(info.dli_fname).parent_path() results in /home/aoxy/projs/nixl/my_build/examples/cpp/../../src/core. This means the code searches for plugins under core/plugins, which does not exist.

Reference on how parent_path works:
https://en.cppreference.com/w/cpp/filesystem/path/parent_path

#include <filesystem>
#include <iostream>
namespace fs = std::filesystem;
 
int main()
{
    for (fs::path p : {"/var/tmp/example.txt", "/", "/var/tmp/."})
        std::cout << "The parent path of " << p
                  << " is " << p.parent_path() << '\n';
}
// Output:
// The parent path of "/var/tmp/example.txt" is "/var/tmp"
// The parent path of "/" is "/"
// The parent path of "/var/tmp/." is "/var/tmp"

List of the actual directory structure:

$ ls /home/jerryao/projs/nixl/my_build/examples/cpp/../../src/
api  bindings  core  infra  plugins  utils

The correct plugin directory should be /home/aoxy/projs/nixl/my_build/examples/cpp/../../src/plugins.

How?

The fix is to use parent_path() twice on info.dli_fname before appending "plugins", so that the plugin directory is set to the correct location above core (i.e., directly under src). Specifically, change

return (std::filesystem::path(info.dli_fname).parent_path() / "plugins").string();

to

return (std::filesystem::path(info.dli_fname).parent_path().parent_path() / "plugins").string();

This ensures the resolved path will be /home/aoxy/projs/nixl/my_build/examples/cpp/../../src/plugins, matching the actual project layout and fixing the loading error.

@aoxy aoxy requested a review from a team as a code owner September 19, 2025 11:59
Copy link

copy-pr-bot bot commented Sep 19, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link

👋 Hi aoxy! Thank you for contributing to ai-dynamo/nixl.

Your PR reviewers will review your contribution then trigger the CI to test your changes.

🚀

@aranadive
Copy link
Contributor

/ok to test 76cd8b0

@aranadive
Copy link
Contributor

/build

aranadive
aranadive previously approved these changes Sep 19, 2025
Copy link
Contributor

@ovidiusm ovidiusm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will work for build directory, but it will break loading from installation directory

@aranadive aranadive dismissed their stale review September 20, 2025 04:50

Will break installs

@aranadive
Copy link
Contributor

This will work for build directory, but it will break loading from installation directory

You're right. Blossom CI passed. So we seem to have a testing gap with wheels in CI.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants