🔍 Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis(CVPR'25)

This is an official implementation for PROMPT-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis (CVPR'25)

Introducing Prompt-CAM, a $${\textcolor{red}{\text{simple yet effective}}}$$ interpretable transformer that requires no architectural modifications to pre-trained ViTs, we just have to inject class-specific prompts into any ViT to make them interpretable.

Prompt CAM lets us explore:

🧠 What the model thinks is important for each class?
✨ Which traits are shared between two bird species?
🎨 How different classes ‘see’ the same image differently!

Quick Start: Try out the demo

🔍 Ever wondered what traits stand out when a model looks at an image of one class but searches with another class in mind? 🤔 Witness the important traits of different class through the lens of Prompt-CAM with our interactive demos!

👉 Try our demo without installing anything in Gooogle Colab

👉 Try our demo locally in

Setup the envoiroment
download the pre-trained model from link below!
run the demo.

👉 You can extend this code base to include: New datasets and New backbones

Environment Setup

conda create -n prompt_cam python=3.7
conda activate prompt_cam  
source env_setup.sh

Data Preparation

You can put all the data in a folder and pass the path to --data_path argument.

The structure of data/images/should be organized as follows:

cub/
├── train/
│   ├── 001.Black_footed_Albatross/
│   │   ├── image_1.jpg
│   │   ├── image_2.jpg
│   │   └── ...
│   ├── 002.Laysan_Albatross/
│   │   ├── image_1.jpg
│   │   ├── image_2.jpg
│   │   └── ...
│   └── ...
└── val/
    ├── 001.Black_footed_Albatross/
    │   ├── image_1.jpg
    │   ├── image_2.jpg
    │   └── ...
    ├── 002.Laysan_Albatross/
    │   ├── image_1.jpg

Prepare CUB dataset

CUB

Download prepared dataset
- From
Or Prepare the dataset by yourself
- You can download the CUB dataset from the original website and put it in the data/images/ folder.
- You can use the dataset's provided train/val split to create the train/val splits and have their class numbers as the prefix of the respective image folder names(starting from 1).
- The code will automatically create train and val annotation files in the data/annotations/ folder for each dataset if not provided.

Prepare Oxford Pet dataset

Pet Dataset

Download prepared dataset
- From

To add new dataset, see Extensions

Results + Checkpoints:

Download from the links below and put it in the checkpoints/{model}/{dataset}/ folder.

Backbone	Dataset	Prompt-CAM(Acc top%1)	Checkpoint Link
dino	cub (CUB)	73.2	url
dino	car (Stanford Cars)	83.2	url
dino	dog (Stanford Dogs)	81.1	url
dino	pet (Oxford Pet)	91.3	url
dino	birds_525 (Birds-525)	98.8	url

Backbone	Dataset	Prompt-CAM(Acc top%1)	Checkpoint Link
dinov2	cub (CUB)	74.1	url
dinov2	dog (Stanford Dogs)	81.3	url
dinov2	pet (Oxford Pet)	92.7	url

Evaluation and Visualization

download the checkpoint from url in the Table above and put it in the checkpoints/{model}/{dataset}/ folder.

For example, to visualize the attention map of the DINO model on the class 024.Red_faced_Cormorant of CUB dataset, put the checkpoint in checkpoints/dino/cub/ folder and run the following command:

CUDA_VISIBLE_DEVICES=0  python visualize.py --config ./experiment/config/prompt_cam/dino/cub/args.yaml --checkpoint ./checkpoints/dino/cub/model.pt --vis_cls 23

The output will be saved in the visualization/dino/cub/class_23/ folder.
Inside the individual image folder, there will be top_traits heatmaps for the target class concatenated if the prediction is correct. Otherwise, all the traits will be concatenated. (the prediction is for the respective image can be found concatenated_prediction_{predicted_class}.jpg).

Visualization Configuration Meaning

config: path to the config file.
checkpoint: path to the checkpoint file.
vis_cls: class number to visualize. (default: 23)
vis_attn: set to True to visualize the attention map. (default: True)
top_traits: number of traits to visualize. (default: 4)
nmbr_samples: number of images from the `vis_cls to visualize. (default: 10)
vis_outdir: output directory. (default: visualization/)

🔥 Training

1️⃣ Pretrained weights

Download the pretrained weights from the following links and put them in the pretrained_weights folder.

ViT-B-DINO rename it as dino_vitbase16_pretrain.pth
ViT-B-DINOV2 rename it as dinov2_vitb14_pretrain.pth

2️⃣ Load dataset

See Data Preparation above.

3️⃣ Start training

👉 To train the model on the CUB dataset using the DINO model, run the following command:

CUDA_VISIBLE_DEVICES=0  python main.py --config ./experiment/config/prompt_cam/dino/cub/args.yaml

The checkpoint will be saved in the output/vit_base_patch16_dino/cub/ folder. Copy the checkpoint model.pt to the checkpoints/dino/cub/ folder.

👉 To train the model on the Oxford Pet dataset using the DINO model, run the following command:

CUDA_VISIBLE_DEVICES=0  python main.py --config ./experiment/config/prompt_cam/dino/pet/args.yaml

The checkpoint will be saved in the output/vit_base_patch14_dino/pet/ folder. Copy the checkpoint model.pt to the checkpoints/dino/pet/ folder.

👉 To train the model on the Oxford Pet dataset using the DINOv2 model, run the following command:

CUDA_VISIBLE_DEVICES=0  python main.py --config ./experiment/config/prompt_cam/dinov2/pet/args.yaml

The checkpoint will be saved in the output/vit_base_patch14_dinov2/pet/ folder. Copy the checkpoint model.pt to the checkpoints/dinov2/pet/ folder.

4️⃣ 🔍 Visualize the attention map

See Visualization above.

Extensions

To add a new dataset

Prepare dataset using above instructions.
add a new dataset file in /data/dataset. Look at the existing dataset files for reference.
modify build_loader.py to include the new dataset.
create a new config file in experiment/config/prompt_cam/{model}/{dataset}/args.yaml
- See experiment/config/prompt_cam/dino/cub/args.yaml for reference and what to modify.

To add a new backbone

modify get_base_model() in build_model.py.
register the new backbone in vision_transformer.py by creating a new function.
add another option in --pretrained_weights and --model in setup_parser() function of main.py to include the new backbone.

Citation

If you find this repository useful, please consider citing our work 📝 and giving a star 🌟 :

@InProceedings{Chowdhury_2025_CVPR,
    author    = {Chowdhury, Arpita and Paul, Dipanjyoti and Mai, Zheda and Gu, Jianyang and Zhang, Ziheng and Mehrab, Kazi Sajeed and Campolongo, Elizabeth G. and Rubenstein, Daniel and Stewart, Charles V. and Karpatne, Anuj and Berger-Wolf, Tanya and Su, Yu and Chao, Wei-Lun},
    title     = {Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {4375-4385}
}

Acknowledgement

VPT: https://github.com/KMnP/vpt
PETL_VISION: https://github.com/OSU-MLB/PETL_Vision

Thanks for their wonderful works.

🛠 create an issue for any contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data/dataset		data/dataset
engine		engine
experiment		experiment
model		model
samples		samples
utils		utils
.gitignore		.gitignore
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
demo.ipynb		demo.ipynb
env_setup.sh		env_setup.sh
main.py		main.py
visualize.py		visualize.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔍 Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis(CVPR'25)

Quick Start: Try out the demo

Environment Setup

Data Preparation

CUB

Pet Dataset

Results + Checkpoints:

Evaluation and Visualization

🔥 Training

1️⃣ Pretrained weights

2️⃣ Load dataset

3️⃣ Start training

4️⃣ 🔍 Visualize the attention map

Extensions

To add a new dataset

To add a new backbone

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

Imageomics/Prompt_CAM

Folders and files

Latest commit

History

Repository files navigation

🔍 Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis(CVPR'25)

Quick Start: Try out the demo

Environment Setup

Data Preparation

CUB

Pet Dataset

Results + Checkpoints:

Evaluation and Visualization

🔥 Training

1️⃣ Pretrained weights

2️⃣ Load dataset

3️⃣ Start training

4️⃣ 🔍 Visualize the attention map

Extensions

To add a new dataset

To add a new backbone

Citation

Acknowledgement

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages