Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions docs/source/en/serving.md
Original file line number Diff line number Diff line change
Expand Up @@ -413,6 +413,17 @@ For example, to enable 4-bit quantization with bitsandbytes, you need to pass ad
transformers serve --quantization bnb-4bit
```

### Available models

The `/v1/models` endpoint scans your local cache for generative models. It checks for LLMs as well as VLMs.
The easiest way to download models to your cache is with the `hf download` command:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The easiest way to download models to your cache is with the `hf download` command:
The easiest way to download models to your cache is with the `transformers download` command:


```shell
hf download <model_id>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually this will download all the files (even "original" files), so there is probably a better way

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it would make sense to expose a utility that would only download the transformers-specific files

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean something like transformers download ? If only it existed... 😛

Image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
hf download <model_id>
transformers download <model_id>

```

As long as it's a transformers-compatible LLM/VLM, it will now show up under `/v1/models`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
As long as it's a transformers-compatible LLM/VLM, it will now show up under `/v1/models`
Once downloaded, the model will show up under `/v1/models`.


### Performance tips

- Use an efficient attention backend when available:
Expand Down