Hi there!
I am really excited about using this project - the only problem is that I am running on a GPU machine where I would like to be able to use only 4 of the 8 GPUs on the cluster. However, the vllm server does not seem to obey the CUDA_VISIBLE_DEVICES environment variable and tries to set up the vllm server on GPU 0 instead of GPU 4 like I had specified. Is there a better way for me to pass the available GPU devices to the environment containing the vllm server?
Thanks!