Skip to content

Conversation

barsolo2000
Copy link

@barsolo2000 barsolo2000 commented Oct 9, 2025

After merging #48 I notice that my test is failing in the upcoming PR: #46.

The issue is that having a map of gpu actions with stop_id as a value will create a situation where unique GPUactions that have the same stop_id and name are getting ignored.
Having a unique identifier for every gpu action will prevent this situation from happening while still ignoring same gpu action to be executed twice.

@barsolo2000 barsolo2000 force-pushed the IdentifierGpuActions branch from ab27dfb to 8982d1f Compare October 9, 2025 22:05
@barsolo2000 barsolo2000 changed the title Added GPU action identifier Added identifier for GPU action Oct 9, 2025
@barsolo2000 barsolo2000 marked this pull request as ready for review October 9, 2025 22:11
/// Check if we are already connected.
bool IsConnected() const { return m_is_connected; }

uint32_t GetNextGPUActionIdentifier() { return ++m_gpuaction_identifier; }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't need this if we can manage it automatically as suggested above.

GPUActions() : identifier(GetNextID()) {}
GPUActions(llvm::StringRef _plugin_name)
: plugin_name(_plugin_name), identifier(GetNextID()) {}
GPUActions(llvm::StringRef _plugin_name, uint32_t _stop_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about if we use delegating constructors here. Something like

struct GPUActions {
  GPUActions() : GPUActions("", {}) {}
  GPUActions(llvm::StringRef _plugin_name)
      : GPUActions(_plugin_name, {}) {};
  
  GPUActions(llvm::StringRef _plugin_name, std::optional<uint32_t> _stop_id)
      : plugin_name(_plugin_name), identifier(GetNextID()), stop_id(_stop_id) {}
}

/// Unique identifier for every GPU action.
uint32_t identifier = 0;

private:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets move this to the bottom so we don't have to switch between private and publick

struct GPUActions {
...
private:
  static uint32_t GetNextID() {
    static std::atomic<uint32_t> id = 0;
    return ++id;
  }
};

@dmpots dmpots changed the title Added identifier for GPU action [GPU] Added identifier for GPU action Oct 10, 2025
Copy link
Collaborator

@dmpots dmpots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines 181 to 182
GPUActions(llvm::StringRef _plugin_name) : GPUActions(_plugin_name, {}) {}
GPUActions(llvm::StringRef _plugin_name, std::optional<uint32_t> _stop_id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't use _ as prefix. LLDB doesn't have that style

Comment on lines 214 to 217
static uint32_t GetNextID() {
static std::atomic<uint32_t> id = 0;
return ++id;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't put this code in the header. In the case of shared libraries, this will possibly create multiple copies of this function and multiple copies of this variable. It's better to keep global variables in cpp files to avoid that kind of issues.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beyond that, atomics are not really free. I'd advocate for letting each plugin handle the id on the way they want. E.g. except for the default constructor, make any constructor forcefully take an identifier, and then each plugin can decide whether to use atomics or just a simple counter for that.
Your map already stores things using the plugin name as first discriminant, so you don't really to make these id globally unique.
Another advantage of not having a globally unique id and not creating it by default in the default constructor, is that a plugin may be able to create a sequence of GPU actions like 1,2,3,4,..., without holes and see if they are all handled correctly in the order they expect. If the default constructor creates id, then your actual sequence might look like 1,3,5,10,11, etc. Which makes things a little bit more difficult when debugging.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the main reason I didn't want to include a GPUActions::GetNextID() in the GPUActions class is we create instances of this class on both sides (LLDB client and lldb-server). We encode the GPUActions class on the lldb-server side and then decode them on the LLDB client side. By not putting the GPU action identifier in this class it allows us to not have code on the LLDB client side ever construct a GPUAction and increment an identifier that won't ever get used.

I suggest we have the LLDBServerPlugin class contain a member variable that tracks the current GPU Action ID and increments it. That way if we have a lldb-server binary with multiple GPU plug-ins, each one can have a unique set of GPUAction identifiers that all start with 1, 2, 3, 4, etc like Walter mentioned above. Or we can have the LLDBServerPlugin hand the next GPUAction with a function like:

class LLDBServerPlugin {
...
  uint32_t m_gpu_action_identifier = 0; /// Each plug-in instance has its own unique GPUAction identifiers
...
  GPUAction LLDBServerPlugin::GetNewGPUAction() {
    return GPUAction(++m_gpu_action_identifier, GetPluginName())
  }
}

Then classes would always ask the LLDBServerPlugin to create the instances, and then fill them in. The current code in the MockGPU plug-in looks like:

  if (native_process->GetStopID() == 3) {
    GPUActions actions(GetPluginName());
    GPUBreakpointInfo bp;
    bp.identifier = BreakpointIDThirdStop;
    bp.name_info = {"a.out", "gpu_third_stop"};

But could be:

  if (native_process->GetStopID() == 3) {
    // Auto fill in the plug-in name and GPU action identifier from the LLDBServerPlugin class
    GPUActions actions = GetNewGPUAction();
    GPUBreakpointInfo bp;
    bp.identifier = BreakpointIDThirdStop;
    bp.name_info = {"a.out", "gpu_third_stop"};

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@clayborg @walter-erquinigo

The one thing that worries me about this approach is that we can easily construct "invalid" GPUAction instances on the server side that do not contain a unique identifier. I think we still need the default constructor in order to support the json deserialization, but that means a user could easily write code that looks fine but is missing the identifier and so it could lead to subtle bugs (e.g. the actions would all map to the same id in the map and could get ignored)

    // This code looks reasonable, but we are missing the identifier
    GPUActions actions;
    actions.plugin_name = GetPluginName();
    GPUBreakpointInfo bp;
    bp.identifier = BreakpointIDThirdStop;
    bp.name_info = {"a.out", "gpu_third_stop"};

Is there a way we can prevent getting into this invalid state?

Copy link
Owner

@clayborg clayborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should let the LLDBServerPlugin instances have their own GPUAction identifiers. See inlined comments for full details.

Comment on lines 214 to 217
static uint32_t GetNextID() {
static std::atomic<uint32_t> id = 0;
return ++id;
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the main reason I didn't want to include a GPUActions::GetNextID() in the GPUActions class is we create instances of this class on both sides (LLDB client and lldb-server). We encode the GPUActions class on the lldb-server side and then decode them on the LLDB client side. By not putting the GPU action identifier in this class it allows us to not have code on the LLDB client side ever construct a GPUAction and increment an identifier that won't ever get used.

I suggest we have the LLDBServerPlugin class contain a member variable that tracks the current GPU Action ID and increments it. That way if we have a lldb-server binary with multiple GPU plug-ins, each one can have a unique set of GPUAction identifiers that all start with 1, 2, 3, 4, etc like Walter mentioned above. Or we can have the LLDBServerPlugin hand the next GPUAction with a function like:

class LLDBServerPlugin {
...
  uint32_t m_gpu_action_identifier = 0; /// Each plug-in instance has its own unique GPUAction identifiers
...
  GPUAction LLDBServerPlugin::GetNewGPUAction() {
    return GPUAction(++m_gpu_action_identifier, GetPluginName())
  }
}

Then classes would always ask the LLDBServerPlugin to create the instances, and then fill them in. The current code in the MockGPU plug-in looks like:

  if (native_process->GetStopID() == 3) {
    GPUActions actions(GetPluginName());
    GPUBreakpointInfo bp;
    bp.identifier = BreakpointIDThirdStop;
    bp.name_info = {"a.out", "gpu_third_stop"};

But could be:

  if (native_process->GetStopID() == 3) {
    // Auto fill in the plug-in name and GPU action identifier from the LLDBServerPlugin class
    GPUActions actions = GetNewGPUAction();
    GPUBreakpointInfo bp;
    bp.identifier = BreakpointIDThirdStop;
    bp.name_info = {"a.out", "gpu_third_stop"};

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants