🤗 Models | 📚 Paper | 📝 Blog | 🐦 Twitter
This repository provides the code for the paper Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings.
- Create environment (Python 3.10+ recommended, tested with Python 3.11) and install deps:
conda create -n drope python=3.11 -y && conda activate drope
pip install --upgrade pip
./scripts/install.sh- (Optional) Login to Hugging Face and W&B:
huggingface-cli login # if you need gated datasets/models
wandb login # if you want online loggingThis project uses Hydra for configuration management. Hydra is a library for managing complex configurations. It allows you to store configurations, load them using a simple API, and override them with command line arguments.
Our models follow HuggingFace's model loading API. For example, to load a DroPE model, you can use the following code:
from transformers import AutoModel, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained('SakanaAI/SmolLM-360M-DroPE', trust_remote_code=True)
model = AutoModel.from_pretrained('SakanaAI/SmolLM-360M-DroPE', trust_remote_code=True, torch_dtype=torch.bfloat16)Inference is then straightforward:
inputs = tokenizer("Hello, world!", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))Recalibrate a pretrained model to a DroPE model. Currently, the repo contains training configs for the following models:
To add a new training config, create a new run-config in cfgs/run_cfg/. You can use the existing configs as a template and override the desired settings such as the model, dataset, training hyperparameters, etc. To add support for a new model, add the model to the MODEL_ARCH_MAP in custom_models/drope.py and create a new model config in cfg/model_cfg/.
Use the launch helper which wraps accelerate launch and selects a DeepSpeed config based on the command line arguments.
- Command:
./launch.sh <num_gpus> <run_config.yaml> <zero_config>- Arguments:
<num_gpus>: number of processes (e.g., 8 for 8 GPUs on one node)<run_config.yaml>: the Hydra run-config incfgs/run_cfg/<zero_config>: choose eitherzero1(ZeRO-1),offload_optim(ZeRO-3 and optimizer only to CPU), oroffload(ZeRO-3 and model weights to CPU)
This starts training with the defaults from the run-config and computes gradient_accumulation_steps to satisfy the requested global train_batch_size.
You can pass Hydra overrides after the first three arguments. E.g.:
- Change batch sizes and steps:
./launch.sh 8 smollm_drope/recalibration_30B.yaml zero1 \
train_batch_size=512 \
per_device_train_batch_size=32 \
max_steps=120000If you encounter cryptic errors or stack traces from Hydra, set the environment variable HYDRA_FULL_ERROR=1 to get a full traceback. This is especially helpful for debugging configuration or instantiation issues.
Example:
HYDRA_FULL_ERROR=1 ./launch.sh 8 llama2_7b_drope.yaml zero1 \
train_batch_size=512 \
per_device_train_batch_size=32 \
max_steps=120000If you find this code useful, please cite as:
@article{gelberg2025extending,
title={Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings},
author={Gelberg, Yoav and Eguchi, Koshi and Akiba, Takuya and Cetin, Edoardo},
journal={arXiv preprint arXiv:2512.12167},
year={2025}
}