Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

Setup

Setup a virtual environment and install all requirements (this will ask for your HuggingFace token):

git clone https://github.com/javiferran/sae_entities.git
cd sae_entities
source setup.sh

For installing SAE-Lens:

pip install sae-lens

Codebase structure

The /dataset folder contains the necessary code to create the dataset and run the model generations. It also includes the generations at /dataset/processed.

The /mech_interp folder contains the code to perform the analysis of the SAE latents.

Get Activations

To cache residual stream activations on entity tokens, for instance of Gemma 2 2B run:

cd sae_entities
python -m utils.activation_cache --model_alias gemma-2-2b --tokens_to_cache entity --batch_size 128 --entity_type_and_entity_name_format

To ensure specificity to entity tokens, we exclude latents that activate frequently (>2%) on random tokens sampled from the Pile dataset. So, for extracting activations of random tokens of the Pile dataset, run:

cd sae_entities
python -m utils.activation_cache --model_alias gemma-2-2b --tokens_to_cache random --batch_size 128 --dataset pile

(Optional) Run the model generations

Model generations can be found in /dataset/processed. However, in case you want to get the generations yourself, you can run the following command (this is for the wikidata dataset):

cd sae_entities
python -m dataset.process_data.wikidata.create_wikidata_entity_queries --model_path gemma-2-2b --free_generation False

SAE Latent Analysis

In mech_interp/feature_analysis.py we compute the SAE latent scores for all layers as well as run metrics to find the most relevant latents.

Uncertainty Latents

Generate and cache the activations for model token, appearing at the end of instruction tokens (only for Gemma models):

python -m utils.activation_cache --model_alias gemma-2b-it --tokens_to_cache model --batch_size 128

Citation

If you find this work useful, please consider citing:

@inproceedings{
ferrando2025iknowentityknowledge,
title={Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models},
author={Javier Ferrando and Oscar Obeso and Senthooran Rajamanoharan and Neel Nanda},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=WCRQFlji2q}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
dataset		dataset
mech_interp		mech_interp
utils		utils
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

Setup

Codebase structure

Get Activations

(Optional) Run the model generations

SAE Latent Analysis

Uncertainty Latents

Citation

About

Uh oh!

Releases

Packages

Languages

License

javiferran/sae_entities

Folders and files

Latest commit

History

Repository files navigation

Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models

Setup

Codebase structure

Get Activations

(Optional) Run the model generations

SAE Latent Analysis

Uncertainty Latents

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages