This is an official implementation for PROMPT-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis (CVPR'25)
Introducing Prompt-CAM, a
Prompt CAM lets us explore:
- π§ What the model thinks is important for each class?
- β¨ Which traits are shared between two bird species?
- π¨ How different classes βseeβ the same image differently!
π Ever wondered what traits stand out when a model looks at an image of one class but searches with another class in mind? π€ Witness the important traits of different class through the lens of Prompt-CAM with our interactive demos!
π Try our demo without installing anything in Gooogle Colab
- Setup the envoiroment
- download the pre-trained model from link below!
- run the demo.
π You can extend this code base to include: New datasets and New backbones
conda create -n prompt_cam python=3.7
conda activate prompt_cam
source env_setup.shYou can put all the data in a folder and pass the path to --data_path argument.
The structure of data/images/should be organized as follows:
cub/
βββ train/
β βββ 001.Black_footed_Albatross/
β β βββ image_1.jpg
β β βββ image_2.jpg
β β βββ ...
β βββ 002.Laysan_Albatross/
β β βββ image_1.jpg
β β βββ image_2.jpg
β β βββ ...
β βββ ...
βββ val/
βββ 001.Black_footed_Albatross/
β βββ image_1.jpg
β βββ image_2.jpg
β βββ ...
βββ 002.Laysan_Albatross/
β βββ image_1.jpg
Prepare CUB dataset
- Download prepared dataset
OrPrepare the dataset by yourself- You can download the CUB dataset from the original website and put it in the
data/images/folder. - You can use the dataset's provided train/val split to create the train/val splits and have their class numbers as the
prefixof the respective image folder names(starting from 1). - The code will automatically create train and val annotation files in the
data/annotations/folder for each dataset if not provided.
- You can download the CUB dataset from the original website and put it in the
To add new dataset, see Extensions
- Download from the links below and put it in the
checkpoints/{model}/{dataset}/folder.
| Backbone | Dataset | Prompt-CAM(Acc top%1) | Checkpoint Link |
|---|---|---|---|
| dino | cub (CUB) | 73.2 | url |
| dino | car (Stanford Cars) | 83.2 | url |
| dino | dog (Stanford Dogs) | 81.1 | url |
| dino | pet (Oxford Pet) | 91.3 | url |
| dino | birds_525 (Birds-525) | 98.8 | url |
| Backbone | Dataset | Prompt-CAM(Acc top%1) | Checkpoint Link |
|---|---|---|---|
| dinov2 | cub (CUB) | 74.1 | url |
| dinov2 | dog (Stanford Dogs) | 81.3 | url |
| dinov2 | pet (Oxford Pet) | 92.7 | url |
- download the checkpoint from url in the Table above and put it in the
checkpoints/{model}/{dataset}/folder.
For example, to visualize the attention map of the DINO model on the class 024.Red_faced_Cormorant of CUB dataset, put the checkpoint in checkpoints/dino/cub/ folder and run the following command:
CUDA_VISIBLE_DEVICES=0 python visualize.py --config ./experiment/config/prompt_cam/dino/cub/args.yaml --checkpoint ./checkpoints/dino/cub/model.pt --vis_cls 23- The output will be saved in the
visualization/dino/cub/class_23/folder. - Inside the individual image folder, there will be
top_traitsheatmaps for the target class concatenated if the prediction is correct. Otherwise, all the traits will be concatenated. (the prediction is for the respective image can be foundconcatenated_prediction_{predicted_class}.jpg).
Visualization Configuration Meaning
config: path to the config file.checkpoint: path to the checkpoint file.vis_cls: class number to visualize. (default: 23)vis_attn: set to True to visualize the attention map. (default: True)top_traits: number of traits to visualize. (default: 4)nmbr_samples: number of images from the `vis_cls to visualize. (default: 10)vis_outdir: output directory. (default: visualization/)
Download the pretrained weights from the following links and put them in the pretrained_weights folder.
- ViT-B-DINO rename it as
dino_vitbase16_pretrain.pth - ViT-B-DINOV2 rename it as
dinov2_vitb14_pretrain.pth
See Data Preparation above.
π To train the model on the CUB dataset using the DINO model, run the following command:
CUDA_VISIBLE_DEVICES=0 python main.py --config ./experiment/config/prompt_cam/dino/cub/args.yamlThe checkpoint will be saved in the output/vit_base_patch16_dino/cub/ folder. Copy the checkpoint model.pt to the checkpoints/dino/cub/ folder.
π To train the model on the Oxford Pet dataset using the DINO model, run the following command:
CUDA_VISIBLE_DEVICES=0 python main.py --config ./experiment/config/prompt_cam/dino/pet/args.yamlThe checkpoint will be saved in the output/vit_base_patch14_dino/pet/ folder. Copy the checkpoint model.pt to the checkpoints/dino/pet/ folder.
π To train the model on the Oxford Pet dataset using the DINOv2 model, run the following command:
CUDA_VISIBLE_DEVICES=0 python main.py --config ./experiment/config/prompt_cam/dinov2/pet/args.yamlThe checkpoint will be saved in the output/vit_base_patch14_dinov2/pet/ folder. Copy the checkpoint model.pt to the checkpoints/dinov2/pet/ folder.
See Visualization above.
- Prepare dataset using above instructions.
- add a new dataset file in
/data/dataset. Look at the existing dataset files for reference. - modify build_loader.py to include the new dataset.
- create a new config file in
experiment/config/prompt_cam/{model}/{dataset}/args.yaml- See
experiment/config/prompt_cam/dino/cub/args.yamlfor reference and what to modify.
- See
- modify
get_base_model()in build_model.py. - register the new backbone in vision_transformer.py by creating a new function.
- add another option in
--pretrained_weightsand--modelinsetup_parser()function of main.py to include the new backbone.
If you find this repository useful, please consider citing our work π and giving a star π :
@InProceedings{Chowdhury_2025_CVPR,
author = {Chowdhury, Arpita and Paul, Dipanjyoti and Mai, Zheda and Gu, Jianyang and Zhang, Ziheng and Mehrab, Kazi Sajeed and Campolongo, Elizabeth G. and Rubenstein, Daniel and Stewart, Charles V. and Karpatne, Anuj and Berger-Wolf, Tanya and Su, Yu and Chao, Wei-Lun},
title = {Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2025},
pages = {4375-4385}
}
- VPT: https://github.com/KMnP/vpt
- PETL_VISION: https://github.com/OSU-MLB/PETL_Vision
Thanks for their wonderful works.
π create an issue for any contributions.
