-
Notifications
You must be signed in to change notification settings - Fork 31.4k
Document the /v1/models endpoint #42588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
|
|
@@ -413,6 +413,17 @@ For example, to enable 4-bit quantization with bitsandbytes, you need to pass ad | |||||
| transformers serve --quantization bnb-4bit | ||||||
| ``` | ||||||
|
|
||||||
| ### Available models | ||||||
|
|
||||||
| The `/v1/models` endpoint scans your local cache for generative models. It checks for LLMs as well as VLMs. | ||||||
| The easiest way to download models to your cache is with the `hf download` command: | ||||||
|
|
||||||
| ```shell | ||||||
| hf download <model_id> | ||||||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually this will download all the files (even "original" files), so there is probably a better way
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe it would make sense to expose a utility that would only download the transformers-specific files
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
| ``` | ||||||
|
|
||||||
| As long as it's a transformers-compatible LLM/VLM, it will now show up under `/v1/models` | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||
|
|
||||||
| ### Performance tips | ||||||
|
|
||||||
| - Use an efficient attention backend when available: | ||||||
|
|
||||||

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.