-
Notifications
You must be signed in to change notification settings - Fork 27
Add vLLM Semantic Router Blog #77
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: bitliu <[email protected]>
Signed-off-by: bitliu <[email protected]>
Deploying vllm-blog-source with
|
Latest commit: |
87c5b92
|
Status: | ✅ Deploy successful! |
Preview URL: | https://9c8df877.vllm-blog-source.pages.dev |
Branch Preview URL: | https://add-vsr-blog.vllm-blog-source.pages.dev |
Signed-off-by: bitliu <[email protected]>
@youkaichao thank you for reviewing, is it ready to go for publishing today? Thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing AI-generated tags and style can make the content more inviting for human to read.
3306f01
to
7198318
Compare
Signed-off-by: bitliu <[email protected]>
7198318
to
6b879b5
Compare
Signed-off-by: bitliu <[email protected]>
Signed-off-by: bitliu <[email protected]>
The latest updates on your projects. Learn more about Vercel for GitHub.
|
I reviewed the blog. Two comments
|
cool, thanks for the review @simon-mo, yep, for the first one, I think we need to emphasize the technique part. And for the second one, I think you raised a key point which is in our roadmap: pluggable embedding model architecture, so for the modernBERT, that is something lightweight and embedded inside the router, and for other embedding model which can be deployed by vLLM engine and that can be also integrated with vsr with external call. |
Thank you @simon-mo for the review!
At the moment, the semantic router uses the modernBERT for internal classification. However, we will explore more ways to get text embedding for semantic cache. Many of these models can be hosted by vLLM and I believe this will be more extensible. We'll detail these directions and use cases in the upcoming revision! |
Signed-off-by: bitliu <[email protected]>
44b3c2e
to
87c5b92
Compare
move #76 here to enable previews