zip2zip: Inference-Time Adaptive Vocabularies for Language Models via Token Compression

zip2zip enables inference-time adaptive token vocabularies for large language models (LLMs). It allows vocabularies to be dynamically augmented at inference time, leading to reduced decoding steps and faster inference.

Features

Dynamic vocabulary adaptation during inference
LZW-based token compression
Support for various encoder configurations
Integration with Hugging Face's transformers library
Compatible with PEFT (Parameter-Efficient Fine-Tuning) models

Installation

You can install zip2zip using pip:

pip install zip2zip

Usage

Same API as Hugging Face

zip2zip	Corresponding HF class
Zip2ZipModel	AutoModelForCausalLM
Zip2ZipTokenizer	AutoTokenizer
Zip2ZipConfig	AutoConfig
Zip2ZipModel.from_pretrained	AutoModelForCausalLM.from_pretrained
Zip2ZipTokenizer.from_pretrained	AutoTokenizer.from_pretrained
Zip2ZipConfig.from_pretrained	AutoConfig.from_pretrained

Pretrained model weights

Size	Model	HF Hub
3.8B	Phi-3.5-mini-instruct-v0.1	epfl-dlab/zip2zip-Phi-3.5-mini-instruct-v0.1
14B	Llama-3.1-8B-Instruct-v0.1	epfl-dlab/zip2zip-Phi-3-medium-instruct-v0.1
...	...	epfl-dlab/zip2zip-models

Run a pretrained model

import torch
from zip2zip import Zip2ZipModel, Zip2ZipTokenizer

pretrained_model_url = "epfl-dlab/zip2zip-Phi-3.5-mini-instruct-v0.1"

device = "cuda" if torch.cuda.is_available() else "cpu"

# Initialize tokenizer
tokenizer = Zip2ZipTokenizer.from_pretrained(pretrained_model_url)

# Initialize model
model = Zip2ZipModel.from_pretrained(pretrained_model_url, device_map=device)

# Generate text
inputs = tokenizer("Write a MultiHeadAttention layer in PyTorch", return_tensors="pt").to(device)
outputs = model.generate(**inputs)

# Print the coloried
generated_text = tokenizer.color_decode(outputs)

You can apply quantization to the model to reduce the memory usage just as you would do with HF models.

model = Zip2ZipModel.from_pretrained(pretrained_model_url, device_map="auto", load_in_8bit=True)

Examples

We provide some examples in the examples folder.

Evaluation

We provide a script to evaluate the performance of the model, compatible with lm-evaluation-harness.

To run the evaluation, you need to install the zip2zip fork of lm-evaluation-harness (the original one is not compatible with zip2zip).

pip install git+https://github.com/epfl-dlab/zip2zip_lm_eval.git

Then, you can run the evaluation:

python bench/run_lm_eval.py

Citation

@misc{geng2025zip2zipinferencetimeadaptivevocabularies,
      title={zip2zip: Inference-Time Adaptive Vocabularies for Language Models via Token Compression},
      author={Saibo Geng and Nathan Ranchin and Yunzhen yao and Maxime Peyrard and Chris Wendler and Michael Gastpar and Robert West},
      year={2025},
      eprint={2506.01084},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2506.01084},
}

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
assets		assets
bench		bench
docs		docs
examples		examples
research		research
src/zip2zip		src/zip2zip
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

zip2zip: Inference-Time Adaptive Vocabularies for Language Models via Token Compression

Features

Installation

Usage

Same API as Hugging Face

Pretrained model weights

Run a pretrained model

Examples

Evaluation

Citation

About

Uh oh!

Releases 1

Packages

Contributors 2

Uh oh!

Languages

License

epfl-dlab/zip2zip

Folders and files

Latest commit

History

Repository files navigation

zip2zip: Inference-Time Adaptive Vocabularies for Language Models via Token Compression

Features

Installation

Usage

Same API as Hugging Face

Pretrained model weights

Run a pretrained model

Examples

Evaluation

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Languages

Packages