DroPE: Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings

This repository provides the code for the paper Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings.

Overview

Installation

Create environment (Python 3.10+ recommended, tested with Python 3.11) and install deps:

conda create -n drope python=3.11 -y && conda activate drope
pip install --upgrade pip
./scripts/install.sh

(Optional) Login to Hugging Face and W&B:

huggingface-cli login    # if you need gated datasets/models
wandb login              # if you want online logging

Interlude (hydra)

This project uses Hydra for configuration management. Hydra is a library for managing complex configurations. It allows you to store configurations, load them using a simple API, and override them with command line arguments.

Inference

Our models follow HuggingFace's model loading API. For example, to load a DroPE model, you can use the following code:

from transformers import AutoModel, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained('SakanaAI/SmolLM-360M-DroPE', trust_remote_code=True)
model = AutoModel.from_pretrained('SakanaAI/SmolLM-360M-DroPE', trust_remote_code=True, torch_dtype=torch.bfloat16)

Inference is then straightforward:

inputs = tokenizer("Hello, world!", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

DroPE Trainer

Recalibrate a pretrained model to a DroPE model. Currently, the repo contains training configs for the following models:

To add a new training config, create a new run-config in cfgs/run_cfg/. You can use the existing configs as a template and override the desired settings such as the model, dataset, training hyperparameters, etc. To add support for a new model, add the model to the MODEL_ARCH_MAP in custom_models/drope.py and create a new model config in cfg/model_cfg/.

Launching a training run

Use the launch helper which wraps accelerate launch and selects a DeepSpeed config based on the command line arguments.

Command:

./launch.sh <num_gpus> <run_config.yaml> <zero_config>

Arguments:
- <num_gpus>: number of processes (e.g., 8 for 8 GPUs on one node)
- <run_config.yaml>: the Hydra run-config in cfgs/run_cfg/
- <zero_config>: choose either zero1 (ZeRO-1), offload_optim (ZeRO-3 and optimizer only to CPU), or offload (ZeRO-3 and model weights to CPU)

This starts training with the defaults from the run-config and computes gradient_accumulation_steps to satisfy the requested global train_batch_size.

Customizing runs

You can pass Hydra overrides after the first three arguments. E.g.:

Change batch sizes and steps:

./launch.sh 8 smollm_drope/recalibration_30B.yaml zero1 \
  train_batch_size=512 \
  per_device_train_batch_size=32 \
  max_steps=120000

Troubleshooting

If you encounter cryptic errors or stack traces from Hydra, set the environment variable HYDRA_FULL_ERROR=1 to get a full traceback. This is especially helpful for debugging configuration or instantiation issues.

Example:

HYDRA_FULL_ERROR=1 ./launch.sh 8 llama2_7b_drope.yaml zero1 \
  train_batch_size=512 \
  per_device_train_batch_size=32 \
  max_steps=120000

Citation

If you find this code useful, please cite as:

@article{gelberg2025extending,
  title={Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings},
  author={Gelberg, Yoav and Eguchi, Koshi and Akiba, Takuya and Cetin, Edoardo},
  journal={arXiv preprint arXiv:2512.12167},
  year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
accelerate_configs		accelerate_configs
cfgs		cfgs
custom_data		custom_data
custom_models		custom_models
scripts		scripts
trainers		trainers
visuals		visuals
README.md		README.md
hydra_utils.py		hydra_utils.py
launch.sh		launch.sh
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DroPE: Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings

Overview

Installation

Interlude (hydra)

Inference

DroPE Trainer

Launching a training run

Customizing runs

Troubleshooting

Citation

About

Uh oh!

Releases

Packages

Contributors 2

Languages

SakanaAI/DroPE

Folders and files

Latest commit

History

Repository files navigation

DroPE: Extending the Context of Pretrained LLMs by Dropping Their Positional Embeddings

Overview

Installation

Interlude (hydra)

Inference

DroPE Trainer

Launching a training run

Customizing runs

Troubleshooting

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages