Skip to content

Conversation

kylesayrs
Copy link
Contributor

No description provided.

Copy link
Collaborator

@dsikka dsikka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think we want to remove these until the new methods can compress / decompress when starting with a checkpoint that isn’t already in memory.

Signed-off-by: Kyle Sayers <[email protected]>
@kylesayrs
Copy link
Contributor Author

@dsikka To be clear, are you referring to compressing/ decompressing from disk? Is there a remaining use case for this?

Signed-off-by: Kyle Sayers <[email protected]>
@dsikka
Copy link
Collaborator

dsikka commented Sep 11, 2025

@dsikka To be clear, are you referring to compressing/ decompressing from disk? Is there a remaining use case for this?

Yeah for anyone who wants to use compressed-tensors independent of the transformers pathway / is using ct as a standalone

I certainly think we can improve these functions but from disk is something we should have some way to support.

@kylesayrs
Copy link
Contributor Author

kylesayrs commented Sep 17, 2025

Yeah for anyone who wants to use compressed-tensors independent of the transformers pathway / is using ct as a standalone

@dsikka There has never been a "from disk" compression pathway. In order to load any compressed model, you must use transformers from_pretrained.

In terms of the from disk decompression pathway, there has also never been a "from disk" pathway that doesn't also rely on transformers from_pretrained. The old disk pathway would use an already compressed model loaded from_pretrained, then load again from disk and replace the weights

@dsikka
Copy link
Collaborator

dsikka commented Sep 18, 2025

Yeah for anyone who wants to use compressed-tensors independent of the transformers pathway / is using ct as a standalone

@dsikka There has never been a "from disk" compression pathway. In order to load any compressed model, you must use transformers from_pretrained.

In terms of the from disk decompression pathway, there has also never been a "from disk" pathway that doesn't also rely on transformers from_pretrained. The old disk pathway would use an already compressed model loaded from_pretrained, then load again from disk and replace the weights

This is not true:

Afrom transformers import AutoModelForCausalLM, AutoTokenizer
from llmcompressor.utils import dispatch_for_generation


MODEL_ID = "nm-testing/TinyLlama-1.1B-Chat-v1.0-W8A8_tensor_weight_static_per_tensor_act-e2e"
model = AutoModelForCausalLM.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0", torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)

from compressed_tensors.compressors import ModelCompressor

compressor = ModelCompressor.from_pretrained(MODEL_ID)
compressor.decompress(MODEL_ID, model)

print("\n\n")
print("========== SAMPLE GENERATION ==============")

dispatch_for_generation(model)
input_ids = tokenizer("Hello my name is", return_tensors="pt").input_ids.to(
    model.device
)
output = model.generate(input_ids, max_new_tokens=100)
print(tokenizer.decode(output[0]))
print("==========================================\n\n")

The compressed weights are decompressed after being read from disk. from_pretrained loads the skeleton for the model however the compressed weights are never read through the from_pretrained pathway. This enables decompression without relying on our transformers integration

This also gives us independent compression functionality, such as

allowing a pathway to checkpoint while maintaining the original state dict in memory for further use

While the default pathway makes sense to be in memory compression / decompression, these are useful tools we still should maintain

@brian-dellabetta
Copy link
Contributor

brian-dellabetta commented Sep 18, 2025

I think part of the motivation for this is that decompress/compress seem like the two main public methods that users should use, which led myself and this user astray. If we don't want to deprecate these, could we at least rename them (maybe to decompress_from_disk/compress_from_disk?) so users don't think they are the preferred methods to be used?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants