-
Notifications
You must be signed in to change notification settings - Fork 416
xprof user guide #2300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
xprof user guide #2300
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we check with the profiling team on how this doc should be framed? There is already detailed documentation that exists for the profiler tools. Should we just link to that instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Qinwen!
|
||
|
||
|
||
* **Sampling Mode:** This mode allows for continuous profiling by sampling data during model execution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curious, how to do Sampling
?
|
||
## Introduction to Xprof | ||
|
||
Xprof is a powerful tool designed for profiling and analyzing the training performance of AI models. For Maxtext developers, understanding and utilizing Xprof can significantly help in optimizing model performance, identifying bottlenecks, and improving training efficiency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Xprof is not open sourced? We probably should recommend to use tensorboard or other OSS tools instead, like cloud version?
|
||
|
||
|
||
* Trace Viewer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It will be great if we have some screenshot to show customers especially for someone is not familiar with tool, like Trace View. But not mandatory (if you think this is clear) :)
Similar comments for other section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of this is maxtext specific, is there any xprof documentation we can point to instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would find some xprof folks to review this as well and comment if we can't find any, rjesha@ probably knows who to reach out to
Description
Start with a short description of what the PR does and how this is a change from
the past.
The rest of the description includes relevant details and context, examples:
If the change fixes a bug or a Github issue, please include a link, e.g.,:
FIXES: b/123456
FIXES: #123456
Notice 1: Once all tests pass, the "pull ready" label will automatically be assigned.
This label is used for administrative purposes. Please do not add it manually.
Notice 2: For external contributions, our settings currently require an approval from a MaxText maintainer to trigger CI tests.
Tests
Please describe how you tested this change, and include any instructions and/or
commands to reproduce.
Checklist
Before submitting this PR, please make sure (put X in square brackets):