-
Notifications
You must be signed in to change notification settings - Fork 29
LLM pipeline implementation #1040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…re pipeline cannot handle an input size larger than the max prefill size
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
…lemented performance benchmark for LLM pipeline
…y input and issue_query only handles output tokens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.
namespace mobile { | ||
|
||
// A method to be called by the backend as soon as the first token is generated (only for token based benchmarks) | ||
static void FirstTokenCallback(void* context) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the use of context?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the context is the arguments that get passed to loadGen, these are created by the driver and sent to the backend. Backend only needs to pass those to the callback without reading/modifying them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@freedomtan to check it.
@freedomtan I need access to google cloud to see what's wrong with the windows build. Could you please provide access? |
@freedomtan @anhappdev IOS test seems to hit a time limit while installing dependencies... Is this normal or is it something related to this PR? |
@anhappdev could you help check the windows and iOS cases? |
The iOS build has a timeout of 180 minutes, which is the reason why it prematurely stopped (cancelled). Here's the log for the Windows build: |
The tflite backend doesn't run either Pixel 9 or Pixel 10.
|
This seems to be the culprit:
it seems the pipeline isn't getting a proper path to the model.. |
The duplicated code is the dataset interface (override function declaration).. |
something like
runs on Pixel devices. |
This PR should resolve the issue with the iOS build: #1064 |
Thanks a ton! I'll look into the windows issue. |
…s calculated per instruction not per sample
|
No description provided.