-
-
Notifications
You must be signed in to change notification settings - Fork 11.7k
Add logging for cudagraph related info #29825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces logging for CUDA graph related information, which is valuable for performance tuning. The changes are well-structured, adding a new configuration flag and plumbing the necessary statistics through the model execution path to the logger. I've identified a critical bug in the logging output format that needs to be addressed. Otherwise, the implementation looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces logging for CUDA graph-related information, which is a valuable addition for performance tuning and debugging. The implementation is well-structured, adding a new configuration flag and plumbing the statistics through the model execution pipeline. I've found one minor issue in the formatting of the log output, where two columns are swapped, which could be misleading. The fix is straightforward. Overall, this is a good contribution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a valuable logging feature for CUDA graph-related information, which will be helpful for performance tuning. The changes are well-structured, adding a new configuration flag and plumbing the necessary statistics through the system. I've identified a minor bug in the new logging class where the columns for padded and unpadded tokens were swapped in the output. I've provided a fix for this. Overall, this is a good addition to the project.
| class CUDAGraphStats: | ||
| num_unpadded_tokens: int | ||
| num_padded_tokens: int | ||
| runtime_mode: str |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use CUDAGraphMode enum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i didnt use an enum because CUDAGraphStats needs to be serializable, and CUDAGraphMode is not serializable (without more intrusive changes / writing a custom serializer for msgpack) since some of its values are tuples of other enum members (FULL_DECODE_ONLY and FULL_AND_PIECEWISE). The error is Enums must contain either all str or all int values - type <enum 'CUDAGraphMode'> is not supported
vllm/compilation/cuda_graph.py
Outdated
| row_counts = Counter(self.rows) | ||
|
|
||
| # Create header | ||
| header = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we generate a markdown table instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i updated the PR to print a markdown table, but it needs extra logic to ensure columns are aligned.
Signed-off-by: Yong Hoon Shin <[email protected]>
Signed-off-by: Yong Hoon Shin <[email protected]>
Signed-off-by: Yong Hoon Shin <[email protected]>
Signed-off-by: Yong Hoon Shin <[email protected]>
Purpose
Add logging related to cudagraph dispatch, namely the distribution of padded/unpadded tokens and the runtime modes over a period of time. This info can be useful for cudagraph capture size tuning.
Adds user-facing
--cudagraph-metricsflag to enable this.Test Plan
Test Result
Logs:
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.