[GPU] Added identifier for GPU action #60

barsolo2000 · 2025-10-09T22:04:34Z

After merging #48 I notice that my test is failing in the upcoming PR: #46.

The issue is that having a map of gpu actions with stop_id as a value will create a situation where unique GPUactions that have the same stop_id and name are getting ignored.
Having a unique identifier for every gpu action will prevent this situation from happening while still ignoring same gpu action to be executed twice.

lldb/include/lldb/Utility/GPUGDBRemotePackets.h

dmpots · 2025-10-09T22:30:26Z

lldb/source/Plugins/Process/gdb-remote/LLDBServerPlugin.h

  /// Check if we are already connected.
  bool IsConnected() const { return m_is_connected; }

+  uint32_t GetNextGPUActionIdentifier() { return ++m_gpuaction_identifier; }


We shouldn't need this if we can manage it automatically as suggested above.

lldb/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp

dmpots · 2025-10-10T19:56:04Z

lldb/include/lldb/Utility/GPUGDBRemotePackets.h

+  GPUActions() : identifier(GetNextID()) {}
+  GPUActions(llvm::StringRef _plugin_name)
+      : plugin_name(_plugin_name), identifier(GetNextID()) {}
+  GPUActions(llvm::StringRef _plugin_name, uint32_t _stop_id)


How about if we use delegating constructors here. Something like

struct GPUActions { GPUActions() : GPUActions("", {}) {} GPUActions(llvm::StringRef _plugin_name) : GPUActions(_plugin_name, {}) {}; GPUActions(llvm::StringRef _plugin_name, std::optional<uint32_t> _stop_id) : plugin_name(_plugin_name), identifier(GetNextID()), stop_id(_stop_id) {} }

dmpots · 2025-10-10T19:56:58Z

lldb/include/lldb/Utility/GPUGDBRemotePackets.h

+  /// Unique identifier for every GPU action.
+  uint32_t identifier = 0;
+
+private:


Lets move this to the bottom so we don't have to switch between private and publick

struct GPUActions { ... private: static uint32_t GetNextID() { static std::atomic<uint32_t> id = 0; return ++id; } };

dmpots

LGTM

walter-erquinigo · 2025-10-10T20:12:31Z

lldb/include/lldb/Utility/GPUGDBRemotePackets.h

+  GPUActions(llvm::StringRef _plugin_name) : GPUActions(_plugin_name, {}) {}
+  GPUActions(llvm::StringRef _plugin_name, std::optional<uint32_t> _stop_id)


don't use _ as prefix. LLDB doesn't have that style

walter-erquinigo · 2025-10-10T20:13:48Z

lldb/include/lldb/Utility/GPUGDBRemotePackets.h

+  static uint32_t GetNextID() {
+    static std::atomic<uint32_t> id = 0;
+    return ++id;
+  }


Don't put this code in the header. In the case of shared libraries, this will possibly create multiple copies of this function and multiple copies of this variable. It's better to keep global variables in cpp files to avoid that kind of issues.

Beyond that, atomics are not really free. I'd advocate for letting each plugin handle the id on the way they want. E.g. except for the default constructor, make any constructor forcefully take an identifier, and then each plugin can decide whether to use atomics or just a simple counter for that.
Your map already stores things using the plugin name as first discriminant, so you don't really to make these id globally unique.
Another advantage of not having a globally unique id and not creating it by default in the default constructor, is that a plugin may be able to create a sequence of GPU actions like 1,2,3,4,..., without holes and see if they are all handled correctly in the order they expect. If the default constructor creates id, then your actual sequence might look like 1,3,5,10,11, etc. Which makes things a little bit more difficult when debugging.

Yeah, the main reason I didn't want to include a GPUActions::GetNextID() in the GPUActions class is we create instances of this class on both sides (LLDB client and lldb-server). We encode the GPUActions class on the lldb-server side and then decode them on the LLDB client side. By not putting the GPU action identifier in this class it allows us to not have code on the LLDB client side ever construct a GPUAction and increment an identifier that won't ever get used.

I suggest we have the LLDBServerPlugin class contain a member variable that tracks the current GPU Action ID and increments it. That way if we have a lldb-server binary with multiple GPU plug-ins, each one can have a unique set of GPUAction identifiers that all start with 1, 2, 3, 4, etc like Walter mentioned above. Or we can have the LLDBServerPlugin hand the next GPUAction with a function like:

class LLDBServerPlugin { ... uint32_t m_gpu_action_identifier = 0; /// Each plug-in instance has its own unique GPUAction identifiers ... GPUAction LLDBServerPlugin::GetNewGPUAction() { return GPUAction(++m_gpu_action_identifier, GetPluginName()) } }

Then classes would always ask the LLDBServerPlugin to create the instances, and then fill them in. The current code in the MockGPU plug-in looks like:

if (native_process->GetStopID() == 3) { GPUActions actions(GetPluginName()); GPUBreakpointInfo bp; bp.identifier = BreakpointIDThirdStop; bp.name_info = {"a.out", "gpu_third_stop"};

But could be:

if (native_process->GetStopID() == 3) { // Auto fill in the plug-in name and GPU action identifier from the LLDBServerPlugin class GPUActions actions = GetNewGPUAction(); GPUBreakpointInfo bp; bp.identifier = BreakpointIDThirdStop; bp.name_info = {"a.out", "gpu_third_stop"};

@clayborg @walter-erquinigo

The one thing that worries me about this approach is that we can easily construct "invalid" GPUAction instances on the server side that do not contain a unique identifier. I think we still need the default constructor in order to support the json deserialization, but that means a user could easily write code that looks fine but is missing the identifier and so it could lead to subtle bugs (e.g. the actions would all map to the same id in the map and could get ignored)

// This code looks reasonable, but we are missing the identifier GPUActions actions; actions.plugin_name = GetPluginName(); GPUBreakpointInfo bp; bp.identifier = BreakpointIDThirdStop; bp.name_info = {"a.out", "gpu_third_stop"};

Is there a way we can prevent getting into this invalid state?

clayborg

I think we should let the LLDBServerPlugin instances have their own GPUAction identifiers. See inlined comments for full details.

clayborg · 2025-10-10T21:55:03Z

lldb/include/lldb/Utility/GPUGDBRemotePackets.h

+  static uint32_t GetNextID() {
+    static std::atomic<uint32_t> id = 0;
+    return ++id;
+  }


Yeah, the main reason I didn't want to include a GPUActions::GetNextID() in the GPUActions class is we create instances of this class on both sides (LLDB client and lldb-server). We encode the GPUActions class on the lldb-server side and then decode them on the LLDB client side. By not putting the GPU action identifier in this class it allows us to not have code on the LLDB client side ever construct a GPUAction and increment an identifier that won't ever get used.

I suggest we have the LLDBServerPlugin class contain a member variable that tracks the current GPU Action ID and increments it. That way if we have a lldb-server binary with multiple GPU plug-ins, each one can have a unique set of GPUAction identifiers that all start with 1, 2, 3, 4, etc like Walter mentioned above. Or we can have the LLDBServerPlugin hand the next GPUAction with a function like:

class LLDBServerPlugin { ... uint32_t m_gpu_action_identifier = 0; /// Each plug-in instance has its own unique GPUAction identifiers ... GPUAction LLDBServerPlugin::GetNewGPUAction() { return GPUAction(++m_gpu_action_identifier, GetPluginName()) } }

Then classes would always ask the LLDBServerPlugin to create the instances, and then fill them in. The current code in the MockGPU plug-in looks like:

if (native_process->GetStopID() == 3) { GPUActions actions(GetPluginName()); GPUBreakpointInfo bp; bp.identifier = BreakpointIDThirdStop; bp.name_info = {"a.out", "gpu_third_stop"};

But could be:

if (native_process->GetStopID() == 3) { // Auto fill in the plug-in name and GPU action identifier from the LLDBServerPlugin class GPUActions actions = GetNewGPUAction(); GPUBreakpointInfo bp; bp.identifier = BreakpointIDThirdStop; bp.name_info = {"a.out", "gpu_third_stop"};

Added GPU action identifier

8982d1f

barsolo2000 force-pushed the IdentifierGpuActions branch from ab27dfb to 8982d1f Compare October 9, 2025 22:05

barsolo2000 changed the title ~~Added GPU action identifier~~ Added identifier for GPU action Oct 9, 2025

barsolo2000 marked this pull request as ready for review October 9, 2025 22:11

dmpots requested changes Oct 9, 2025

View reviewed changes

dmpots requested review from clayborg, jeffreytan81 and walter-erquinigo October 9, 2025 22:36

Addressed comments

21208a1

dmpots reviewed Oct 10, 2025

View reviewed changes

fixed comments

f00fd25

dmpots changed the title ~~Added identifier for GPU action~~ [GPU] Added identifier for GPU action Oct 10, 2025

dmpots approved these changes Oct 10, 2025

View reviewed changes

walter-erquinigo reviewed Oct 10, 2025

View reviewed changes

clayborg requested changes Oct 10, 2025

View reviewed changes

Bar Soloveychik added 2 commits October 10, 2025 16:06

fix

47667d7

fixed AMD GPUAction calls

089a40f

		GPUActions(llvm::StringRef _plugin_name) : GPUActions(_plugin_name, {}) {}
		GPUActions(llvm::StringRef _plugin_name, std::optional<uint32_t> _stop_id)

[GPU] Added identifier for GPU action #60

Are you sure you want to change the base?

[GPU] Added identifier for GPU action #60

Uh oh!

Conversation

barsolo2000 commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dmpots left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clayborg left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

barsolo2000 commented Oct 9, 2025 •

edited

Loading