Skip to content

Conversation

arkel23
Copy link

@arkel23 arkel23 commented Dec 22, 2020

Added support for 'H-14' and L'16' ViT models.
Added support for downloading the models directly from Google's cloud storage.
Corrected the Jax to Pytorch weights transformation. Previous methodology would lead to .pth state_dict files without the 'representation layer'. ViT('load_repr_layer'=True) would lead to an error. If only interested in inference the representation layer was unnecessary as discussed in the original paper for the Vision Transformer, but for other applications and experiments it may be useful so I added a download_convert_models.py to first download the required models, convert them with all the weights, and then you can completely tune the parameters.
Added support for visualizing attention, by returning the scores values in the multi-head self-attention layers. The visualizing script was mostly taken from jeonsworld/ViT-pytorch repository.
Added examples for inference (single image), and fine-tuning/training (using CIFAR-10).

arkel23 and others added 13 commits December 18, 2020 00:32
…s for the conversion, added download links to download.sh and configs.py for models that were missing
… beforehand it directly downloads them to torchhub and then converts them on the fly
…to return head scores if given parameter visualize=True is given, otherwise functionality stays the same
…y, also added an example with cifar-10. changed the loading logic to allow for appropriate loading of all layers regardless of if loading fc layers with different number of classes and/or representation layer. also verified that they load properly
* Forked from [Luke Melas-Kyriazi repository](https://github.com/lukemelas/PyTorch-Pretrained-ViT). 
* Added support for 'H-14' and L'16' ViT models.
* Added support for downloading the models directly from Google's cloud storage.
* Corrected the Jax to Pytorch weights transformation. Previous methodology would lead to .pth state_dict files without the 'representation layer'. `ViT('load_repr_layer'=True)` would lead to an error. If only interested in inference the representation layer was unnecessary as discussed in the original paper for the Vision Transformer, but for other applications and experiments it may be useful so I added a `download_convert_models.py` to first download the required models, convert them with all the weights, and then you can completely tune the parameters.
* Added support for visualizing attention, by returning the scores values in the multi-head self-attention layers. The visualizing script was mostly taken from [jeonsworld/ViT-pytorch repository](https://github.com/jeonsworld/ViT-pytorch).
* Added examples for inference (single image), and fine-tuning/training (using CIFAR-10).
@lukemelas
Copy link
Owner

lukemelas commented Dec 23, 2020 via email

@huananerban
Copy link

@arkel23 I would like to ask you why my L-16 pre-trained model still can't be trained, I get an error"Missing keys when loading pretrained weights: []"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants