huggingface · LysandreJik · Dec 3, 2025 · Wauplin · Dec 3, 2025 · LysandreJik
diff --git a/docs/source/en/serving.md b/docs/source/en/serving.md
@@ -413,6 +413,17 @@ For example, to enable 4-bit quantization with bitsandbytes, you need to pass ad
 transformers serve --quantization bnb-4bit
 ```
 
+### Available models
+
+The `/v1/models` endpoint scans your local cache for generative models. It checks for LLMs as well as VLMs.
+The easiest way to download models to your cache is with the `hf download` command:
-The easiest way to download models to your cache is with the `hf download` command:
+The easiest way to download models to your cache is with the `transformers download` command:
-The easiest way to download models to your cache is with the `hf download` command:
+The easiest way to download models to your cache is with the `transformers download` command:
+
+```shell
+hf download <model_id>
-hf download <model_id>
+transformers download <model_id>
-hf download <model_id>
+transformers download <model_id>
+```
+
+As long as it's a transformers-compatible LLM/VLM, it will now show up under `/v1/models`
-As long as it's a transformers-compatible LLM/VLM, it will now show up under `/v1/models`
+Once downloaded, the model will show up under `/v1/models`.
-As long as it's a transformers-compatible LLM/VLM, it will now show up under `/v1/models`
+Once downloaded, the model will show up under `/v1/models`.
+
 ### Performance tips
 
 - Use an efficient attention backend when available: