Skip to content

This is an official implementation for PROMPT-CAM: A Simpler Interpretable Transformer for Fine-Grained Analysis (CVPR'25)

License

Notifications You must be signed in to change notification settings

Imageomics/Prompt_CAM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ” Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis(CVPR'25)

This is an official implementation for PROMPT-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis (CVPR'25)

Introducing Prompt-CAM, a $${\textcolor{red}{\text{simple yet effective}}}$$ interpretable transformer that requires no architectural modifications to pre-trained ViTs, we just have to inject class-specific prompts into any ViT to make them interpretable.

Prompt CAM lets us explore:

  • 🧠 What the model thinks is important for each class?
  • ✨ Which traits are shared between two bird species?
  • 🎨 How different classes β€˜see’ the same image differently!

Quick Start: Try out the demo

πŸ” Ever wondered what traits stand out when a model looks at an image of one class but searches with another class in mind? πŸ€” Witness the important traits of different class through the lens of Prompt-CAM with our interactive demos!

πŸ‘‰ Try our demo without installing anything in Gooogle Colab

πŸ‘‰ Try our demo locally in

  • Setup the envoiroment
  • download the pre-trained model from link below!
  • run the demo.

πŸ‘‰ You can extend this code base to include: New datasets and New backbones

Environment Setup

conda create -n prompt_cam python=3.7
conda activate prompt_cam  
source env_setup.sh

Data Preparation

You can put all the data in a folder and pass the path to --data_path argument.

The structure of data/images/should be organized as follows:

cub/
β”œβ”€β”€ train/
β”‚   β”œβ”€β”€ 001.Black_footed_Albatross/
β”‚   β”‚   β”œβ”€β”€ image_1.jpg
β”‚   β”‚   β”œβ”€β”€ image_2.jpg
β”‚   β”‚   └── ...
β”‚   β”œβ”€β”€ 002.Laysan_Albatross/
β”‚   β”‚   β”œβ”€β”€ image_1.jpg
β”‚   β”‚   β”œβ”€β”€ image_2.jpg
β”‚   β”‚   └── ...
β”‚   └── ...
└── val/
    β”œβ”€β”€ 001.Black_footed_Albatross/
    β”‚   β”œβ”€β”€ image_1.jpg
    β”‚   β”œβ”€β”€ image_2.jpg
    β”‚   └── ...
    β”œβ”€β”€ 002.Laysan_Albatross/
    β”‚   β”œβ”€β”€ image_1.jpg
Prepare CUB dataset

CUB

  • Download prepared dataset
    • From
  • Or Prepare the dataset by yourself
    • You can download the CUB dataset from the original website and put it in the data/images/ folder.
    • You can use the dataset's provided train/val split to create the train/val splits and have their class numbers as the prefix of the respective image folder names(starting from 1).
    • The code will automatically create train and val annotation files in the data/annotations/ folder for each dataset if not provided.
Prepare Oxford Pet dataset

Pet Dataset

  • Download prepared dataset
    • From

To add new dataset, see Extensions

Results + Checkpoints:

  • Download from the links below and put it in the checkpoints/{model}/{dataset}/ folder.
Backbone Dataset Prompt-CAM(Acc top%1) Checkpoint Link
dino cub (CUB) 73.2 url
dino car (Stanford Cars) 83.2 url
dino dog (Stanford Dogs) 81.1 url
dino pet (Oxford Pet) 91.3 url
dino birds_525 (Birds-525) 98.8 url
Backbone Dataset Prompt-CAM(Acc top%1) Checkpoint Link
dinov2 cub (CUB) 74.1 url
dinov2 dog (Stanford Dogs) 81.3 url
dinov2 pet (Oxford Pet) 92.7 url

Evaluation and Visualization

  • download the checkpoint from url in the Table above and put it in the checkpoints/{model}/{dataset}/ folder.

For example, to visualize the attention map of the DINO model on the class 024.Red_faced_Cormorant of CUB dataset, put the checkpoint in checkpoints/dino/cub/ folder and run the following command:

CUDA_VISIBLE_DEVICES=0  python visualize.py --config ./experiment/config/prompt_cam/dino/cub/args.yaml --checkpoint ./checkpoints/dino/cub/model.pt --vis_cls 23
  • The output will be saved in the visualization/dino/cub/class_23/ folder.
  • Inside the individual image folder, there will be top_traits heatmaps for the target class concatenated if the prediction is correct. Otherwise, all the traits will be concatenated. (the prediction is for the respective image can be found concatenated_prediction_{predicted_class}.jpg).
Visualization Configuration Meaning
  • config: path to the config file.
  • checkpoint: path to the checkpoint file.
  • vis_cls: class number to visualize. (default: 23)
  • vis_attn: set to True to visualize the attention map. (default: True)
  • top_traits: number of traits to visualize. (default: 4)
  • nmbr_samples: number of images from the `vis_cls to visualize. (default: 10)
  • vis_outdir: output directory. (default: visualization/)

πŸ”₯ Training

1️⃣ Pretrained weights


Download the pretrained weights from the following links and put them in the pretrained_weights folder.

  1. ViT-B-DINO rename it as dino_vitbase16_pretrain.pth
  2. ViT-B-DINOV2 rename it as dinov2_vitb14_pretrain.pth

2️⃣ Load dataset


See Data Preparation above.

3️⃣ Start training


πŸ‘‰ To train the model on the CUB dataset using the DINO model, run the following command:

CUDA_VISIBLE_DEVICES=0  python main.py --config ./experiment/config/prompt_cam/dino/cub/args.yaml

The checkpoint will be saved in the output/vit_base_patch16_dino/cub/ folder. Copy the checkpoint model.pt to the checkpoints/dino/cub/ folder.


πŸ‘‰ To train the model on the Oxford Pet dataset using the DINO model, run the following command:

CUDA_VISIBLE_DEVICES=0  python main.py --config ./experiment/config/prompt_cam/dino/pet/args.yaml

The checkpoint will be saved in the output/vit_base_patch14_dino/pet/ folder. Copy the checkpoint model.pt to the checkpoints/dino/pet/ folder.


πŸ‘‰ To train the model on the Oxford Pet dataset using the DINOv2 model, run the following command:

CUDA_VISIBLE_DEVICES=0  python main.py --config ./experiment/config/prompt_cam/dinov2/pet/args.yaml

The checkpoint will be saved in the output/vit_base_patch14_dinov2/pet/ folder. Copy the checkpoint model.pt to the checkpoints/dinov2/pet/ folder.


4️⃣ πŸ” Visualize the attention map


See Visualization above.

Extensions

To add a new dataset

  1. Prepare dataset using above instructions.
  2. add a new dataset file in /data/dataset. Look at the existing dataset files for reference.
  3. modify build_loader.py to include the new dataset.
  4. create a new config file in experiment/config/prompt_cam/{model}/{dataset}/args.yaml
    • See experiment/config/prompt_cam/dino/cub/args.yaml for reference and what to modify.

To add a new backbone

  • modify get_base_model() in build_model.py.
  • register the new backbone in vision_transformer.py by creating a new function.
  • add another option in --pretrained_weights and --model in setup_parser() function of main.py to include the new backbone.

Citation Paper

If you find this repository useful, please consider citing our work πŸ“ and giving a star 🌟 :

@InProceedings{Chowdhury_2025_CVPR,
    author    = {Chowdhury, Arpita and Paul, Dipanjyoti and Mai, Zheda and Gu, Jianyang and Zhang, Ziheng and Mehrab, Kazi Sajeed and Campolongo, Elizabeth G. and Rubenstein, Daniel and Stewart, Charles V. and Karpatne, Anuj and Berger-Wolf, Tanya and Su, Yu and Chao, Wei-Lun},
    title     = {Prompt-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2025},
    pages     = {4375-4385}
}

Acknowledgement

Thanks for their wonderful works.

πŸ›  create an issue for any contributions.

About

This is an official implementation for PROMPT-CAM: A Simpler Interpretable Transformer for Fine-Grained Analysis (CVPR'25)

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published