-
Notifications
You must be signed in to change notification settings - Fork 110
Error check to avoid null config dereference #4319
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds error checking to prevent a null dereference when accessing a kernel configuration, replacing a confusing std::bad_alloc
abort with a clearer MIGraphX exception. The change provides better error diagnostics when no configuration is found for a kernel operation.
- Adds null check for
config
before dereferencing - Throws descriptive MIGraphX exception instead of allowing null dereference abort
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
src/targets/gpu/compile_ops.cpp
Outdated
if(results.size() == 1) | ||
{ | ||
if(not config) | ||
MIGRAPHX_THROW("Kernel without config for " + preop.name()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A config is not required when there is only one kernel.
Do you have a backtrace for the debug build as this does not seem to be the right place to check for the null config? |
Here's a backtrace @pfultz2
Edit:
|
There seem to be more
Commenting out the blanket try/catches yields:
The error happens inside hiprtc_src_file::hiprtc_src_file(const src_file& s). It seems src_file::content, which is std::string_view, is invalid. For So lets change it:
And it all works now! I can compile stuff. So the LD based embedding fails somehow for my custom TheRock build or linux. So LD embedding works with this:
But why is the old |
If "config is not required when there is only one kernel" then this PR is not good. Would you like me to modify this PR to fix print_modules() instead and also add a fix for Embed.cmake? @pfultz2 |
Ah yes. The blanket hiding of exceptions. That's quite something. Maybe make it always log out the exception so people can actually see that there are serious problems? |
People complain about the unnecessary noise, because applicability checks do not always have 100% coverage in mlir and are completely missing from CK. So for normal cases it should be ignored.
If all kernels do fail, it should output the exception along with the modules it was compiling, but this might need some more work. Either way, you can set |
Yes please do. |
…nt executable (PIE) is enabled
I have updated the PR now with check at print_modules and a patch for Embed.cmake. Please review @pfultz2 I also had a cursory look at other ROCm modules in regards to embedding files and I see MIOpen does it properly (https://github.com/ROCm/rocm-libraries/blob/develop/projects/miopen/cmake/embed.cmake#L68-L83), but there is the same problem in CK. Someone should probably fix it: https://github.com/ROCm/composable_kernel/blob/develop/cmake/Embed.cmake#L99-L105 |
We used to do it that way but it was changed in #1999. |
Motivation
Give a more proper error instead of abort on
std::bad_alloc
.Technical Details
I'm trying to build migraphx on top of TheRock. It does not work and aborts with std::bad_alloc. I traced this to a null
std::optional
dereference (after recompiling with-D_GLIBCXX_ASSERTIONS
):The fix is to check for null config. This now results in a migraphx error:
I have no idea what the error means though. Migraphx works just fine when I compile the same way against a stock ROCm install. So I suppose TheRock must be doing something strange.
Changelog Category