-
Notifications
You must be signed in to change notification settings - Fork 27
[New blog] Inside vLLM: Anatomy of a High-Throughput LLM Inference System #80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 12 commits
6394e60
f1ac4a1
f786d14
0274f3f
f09488d
fb4538e
8e09382
9e3b6b2
c16a69e
66b6d03
d452aad
3476bef
af78f2c
beab729
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I doubt if the relevant attention metadata and input ids etc for decode is right :( you should be able to just print them after preparing input. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same comment as above, i made a few simplifications in that drawing/explanation as i believe it's ok at that level of abstraction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the decode step,
input_ids
only contains new tokens, same forpositions
andslot_mapping
. we useblock_table
to keep track of the existing kv cache.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i'm aware of that, i made some simplifications so that i have to explain less hah
i'd cover those details if i covered fwd pass kernel!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hope we can have some disclaimer for this.