Quantization Memory Requirements

Hello!

I was trying the various quantization recipes for quantizing a 70B Llama 3 based model to FP8, INT8, INT4(A16) precisions as mentioned in the [quantization docs by vLLM](https://docs.vllm.ai/en/stable/features/quantization/index.html). 

1. Could you help me understand the memory requirements for the quantization recipes, i.e SmoothQuant (SmoothQuantModifier), GPTQ (GPTQModifier) and RTN (QuantizationModifier). A calculation/formula would help, for example, like the one we have for calculating kv cache:
```
memory in bytes for kv cache = 80 (layers) * 8 (kv heads) * 128 (head_dim) * 8192 (seq length) * 2 (k and v) * 2 (fp16)
```
I understand that the `calculate_offload_device_map` creates a custom device map by reserving memory for 
GPTQ (reserve_for_hessians), but I would still like to understand the memory requirements to be able to utilize the GPU memory efficiently, to understand where all the GPU memory is consumed and to ensure that there are no bugs. 

2. Also, I understand that currently, for quantization of big models, the model is split in a pipeline parallel way on multiple GPUs available on the instance. 
- Since the GPU which is being used at any given time is the one which has the model layer that is being quantized at that time, would the time taken to quantize the model be similar to using a single GPU to quantize the model vs using multiple GPUs? 
- Is it possible to split the model in a tensor parallel way? 
- I understand that 'non-sequential GPTQ ' is deprecated, but how much memory is required for a non-sequential GPTQ? I think the above memory calculation would help. Also, how much speed up would we see using the non-sequential approach (compared to the sequential one)?


Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Quantization Memory Requirements #1228

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Quantization Memory Requirements #1228

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions