Note
A detailed analysis of scale generalization for various models is given in our preprint
Just a Matter of Scale? Reevaluating Scale Equivariance in Convolutional Neural Networks
Thomas Altstidl, An Nguyen, Leo Schwinn, Franz Köferl, Christopher Mutschler, Björn Eskofier, Dario Zanca
This repository contains the official source code accompanying our preprint. If you are reading this, it is likely that you fall into one or more of the following groups. Click on those that are applicable for you to get started.
I am interested in using the Scaled and Translated Image Recognition (STIR) dataset.
- Download one or more data files from Zenodo.
- Grab a copy of dataset.py.
- Example usage that loads training data from
emoji.npzfor scales 17 through 64.
from dataset import STIRDataset
dataset = STIRDataset('data/emoji.npz')
# Obtain images and labels for training
images, labels = dataset.to_torch(split='train', scales=range(17, 65), shuffle=True)
# Obtain known scales and positions for above
scales, positions = dataset.get_latents(split='train', scales=range(17, 65), shuffle=True)
# Get metadata and label descriptions
metadata = dataset.metadata
label_descriptions = dataset.labeldataI am interested in reviewing your results.
We provide a subset of our results for review. Others are available upon request as they are larger in size.
- clean.csv contains testing accuracy and time (columns
metrics.test_accandmetrics.train_time) - generalization.csv contains accuracies per scale (columns
s17throughs64)
I am interested in using the proposed layer in my own work.
- Grab a copy of layers.py.
- Example usage that applies one 7x7 scaled convolutional layer followed by pixel-wise pooling.
from torch import nn
from layers import SiConv2d, ScalePool
class MyModel(nn.Module):
def __init__(self):
super().__init__()
# 7x7 base kernel rescaled to 29 different scales
self.conv = SiConv2d(3, 16, 29, 7, interp_mode='bicubic')
self.pool = ScalePool(mode='pixel')
def forward(self, x):
x = self.conv(x)
x = self.pool(x)The remainer of this document will focus on reproducing the results given in our preprint.
Warning
While we have taken great care to document everything, the scope of this project makes it likely that minor details may still be missing. If you have trouble recreating our experiments on your own machines, please create a new issue and we'd be more than happy to assist.
The provided code should work in most environments and has been tested to work at least in Windows 10/11 (local environment) and Linux (cluster node environment). Python 3.8 was used, although newer versions should also work. We recommend creating a new virtual environment and installing all requirements there:
cd /path/to/provided/code
python -m venv .venv
source .venv/bin/activate
pip install requirements.txtBefore training a model, you will need to either create or download the respective data files you intend to use. These can be downloaded from Zenodo. Then, execute the following script with your selected parameters to train a single model.
python scripts/train.py [...]-
--model {standard, pixel_pool, slice_pool, energy_pool, conv3d, ensemble, spatial_transform, xu, kanazawa, hermite, disco}
Name of (scale-equivariant) model that should be trained. Implementations are given in
siconvnet/models.py. -
--dataset {emoji, mnist, trafficsign, aerial}
Name of dataset on which the model should be trained. The respective
[d].npzfile needs to be in the current working directory. See paper Fig. 3. - --evaluation {1, 2, 3, 4} Evaluation scenario on which the model should be trained. Defines scales for training and evaluation. See paper Fig. 3.
-
--kernel-size {3, 7, 11, 15}
Kernel size of all convolutions. Defines size
$k \times k$ of trainable kernel weights. Fixed to 7 in paper. - --interpolation {nearest, bilinear, bicubic, area} Interpolation method used to generate larger kernels. Only applies to our models. Fixed to bicubic in paper.
- --lr {1e-2, 1e-3} Learning rate of Adam optimizer used to train model.
- --seed number Seed used to initialize random number generators for reproducibility. Seeds used in paper are 1 through 50.
The training script writes results to MLflow. Before proceeding with the evaluation, you need to export all runs. Unless you changed the tracking destination, this is done using the following command. We provide our own filtered export in clean.csv.
mlflow experiments csv -x 0 -o clean.csvThen, execute the following script with your selected parameters to evaluate all models.
python scripts/eval.py [...]- --runs path/to/clean.csv Path to the exported runs from MLflow. Should point to file exported using above command.
- --models path/to/models Path to the run artifacts saved by MLflow. Should be
mlruns/0when run locally. - --data {emoji, mnist, trafficsign, aerial} Name of dataset for which models should be evaluated.
- --generalization Flag for scale generalization. If enabled, will write
generalization_*.csvfiles. - --equivariance Flag for scale equivariance. If enabled, will write
eval/*/errors.npzfiles. - --index-correlation Flag for pooling scale correlation. If enabled, will write
eval/*/indices.npzfiles.
To recreate the plots given in the paper and in the supplementary document you may use the scripts provided in the plots/ directory. We provide clean.csv and generalization.csv here. Others are available upon request as they are larger in size.
equivariance.pywas used for Fig. 6 & Suppl. Fig. 3 and requires bothscripts/clean.csvandplots/eval/*/errors.npzgeneralization.pywas used for Fig. 5 & Suppl. Fig. 2 and requires bothscripts/clean.csvandplots/generalization_*.csvhyperparam.pywas used for Fig. 4 & Suppl. Fig. 1 and requires onlyscripts/clean.csvindices.pywas used for Fig. 7 & Suppl. Fig. 4 and requires bothscripts/clean.csvandplots/eval/*/indices.npztime.pywas used for Tab. 2 and requires onlyscripts/clean.csv