CNN Architectures

Convolutional Neural Network Architectures

Introduction

We will now explore some famous CNN architectures as implemented in torchvision. This package adds functionality to the PyTorch ecosystem focused on working with images and video data. You have already seen applications of torchvision in the context of defining and applying transforms. This package also provides tools for loading images, defining image DataSets, and loading built-in datasets. As you will see in this section, it also provides access to famous CNN architectures and associated pre-trained weights. Other than just scene labeling or classification tasks, there are also tools and datasets specific to semantic segmentation, object detection, and instance segmentation.

In this section, we will specifically explore VGGNet and ResNet architectures. However, we will not discuss their implementations in detail since these models are discussed in-depth in the CNN lecture module. Instead, we will focus on how to implement these models using torchvision. The examples here for VGGNet and ResNet will translate well to other architectures made available through torchvision. In the next section, we will bring together what you have learned throughout the CNN modules to use transfer learning and a modified ResNet-34 architecture to classify the EuroSatAllBands dataset.

I begin by importing the needed packages. In this module, I will use the Python Imaging Library (PIL) to read in images. I am using this package because it works well with torchvision. I also set the device variable to the GPU if available.

import torch 
import torch.nn as nn

from torchsummary import summary

import torchvision
from torchvision import transforms

from PIL import Image

import matplotlib.pyplot as plt

import numpy as np
import pandas as pd
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)
cuda:0

VGGNet-16

Models are available in the torchvision.models subpackage. More information about this subpackage can be found here: https://pytorch.org/vision/stable/models.html. As a first example, I am instantiating an instance of the VGGNet-16 architecture. I also move the model to the GPU using the to() method. The pretrained parameter is used to load in pre-trained weights. This allows you to either (1) use the pre-trained model to predict to new data or (2) instantiate an instance of the model using the pre-trained weights as opposed to random weights that can then be fine-tuned on new data. If you run this model with the pretrained parameter set to True, the weights will be downloaded to your local device or your Google Drive if you are working in CoLab.

modelVGG = torchvision.models.vgg16(pretrained=True).to(device)
C:\Users\vidcg\ANACON~1\envs\torchENV\lib\site-packages\torchvision\models\_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
C:\Users\vidcg\ANACON~1\envs\torchENV\lib\site-packages\torchvision\models\_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG16_Weights.IMAGENET1K_V1`. You can also use `weights=VGG16_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)

I next use the summary function from torchsummary to print a summary of the model if an image with a shape of (3,256,256) is passed through the network. This is a fairly large model with 138,357,544 trainable parameters and an estimated total size of 814 MB. Again, we will not discuss the components of this architecture here since it was explored in the lecture module.

summary(modelVGG, (3, 256, 256))
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 256, 256]           1,792
              ReLU-2         [-1, 64, 256, 256]               0
            Conv2d-3         [-1, 64, 256, 256]          36,928
              ReLU-4         [-1, 64, 256, 256]               0
         MaxPool2d-5         [-1, 64, 128, 128]               0
            Conv2d-6        [-1, 128, 128, 128]          73,856
              ReLU-7        [-1, 128, 128, 128]               0
            Conv2d-8        [-1, 128, 128, 128]         147,584
              ReLU-9        [-1, 128, 128, 128]               0
        MaxPool2d-10          [-1, 128, 64, 64]               0
           Conv2d-11          [-1, 256, 64, 64]         295,168
             ReLU-12          [-1, 256, 64, 64]               0
           Conv2d-13          [-1, 256, 64, 64]         590,080
             ReLU-14          [-1, 256, 64, 64]               0
           Conv2d-15          [-1, 256, 64, 64]         590,080
             ReLU-16          [-1, 256, 64, 64]               0
        MaxPool2d-17          [-1, 256, 32, 32]               0
           Conv2d-18          [-1, 512, 32, 32]       1,180,160
             ReLU-19          [-1, 512, 32, 32]               0
           Conv2d-20          [-1, 512, 32, 32]       2,359,808
             ReLU-21          [-1, 512, 32, 32]               0
           Conv2d-22          [-1, 512, 32, 32]       2,359,808
             ReLU-23          [-1, 512, 32, 32]               0
        MaxPool2d-24          [-1, 512, 16, 16]               0
           Conv2d-25          [-1, 512, 16, 16]       2,359,808
             ReLU-26          [-1, 512, 16, 16]               0
           Conv2d-27          [-1, 512, 16, 16]       2,359,808
             ReLU-28          [-1, 512, 16, 16]               0
           Conv2d-29          [-1, 512, 16, 16]       2,359,808
             ReLU-30          [-1, 512, 16, 16]               0
        MaxPool2d-31            [-1, 512, 8, 8]               0
AdaptiveAvgPool2d-32            [-1, 512, 7, 7]               0
           Linear-33                 [-1, 4096]     102,764,544
             ReLU-34                 [-1, 4096]               0
          Dropout-35                 [-1, 4096]               0
           Linear-36                 [-1, 4096]      16,781,312
             ReLU-37                 [-1, 4096]               0
          Dropout-38                 [-1, 4096]               0
           Linear-39                 [-1, 1000]       4,097,000
================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.75
Forward/backward pass size (MB): 285.64
Params size (MB): 527.79
Estimated Total Size (MB): 814.18
----------------------------------------------------------------

There are sometimes different flavors or implementations of the same model available. In the example below, I have created an instance of the VGGNet-16 architecture that incorporates batch normalization using the vgg16_bn() function. Printing the summary, you can see that the model now includes batch normalization layers. This also increases the required memory size of the model and the number of trainable parameters.

modelVGG = torchvision.models.vgg16_bn(pretrained=True).to(device)
C:\Users\vidcg\ANACON~1\envs\torchENV\lib\site-packages\torchvision\models\_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=VGG16_BN_Weights.IMAGENET1K_V1`. You can also use `weights=VGG16_BN_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
summary(modelVGG, (3, 256, 256))
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 256, 256]           1,792
       BatchNorm2d-2         [-1, 64, 256, 256]             128
              ReLU-3         [-1, 64, 256, 256]               0
            Conv2d-4         [-1, 64, 256, 256]          36,928
       BatchNorm2d-5         [-1, 64, 256, 256]             128
              ReLU-6         [-1, 64, 256, 256]               0
         MaxPool2d-7         [-1, 64, 128, 128]               0
            Conv2d-8        [-1, 128, 128, 128]          73,856
       BatchNorm2d-9        [-1, 128, 128, 128]             256
             ReLU-10        [-1, 128, 128, 128]               0
           Conv2d-11        [-1, 128, 128, 128]         147,584
      BatchNorm2d-12        [-1, 128, 128, 128]             256
             ReLU-13        [-1, 128, 128, 128]               0
        MaxPool2d-14          [-1, 128, 64, 64]               0
           Conv2d-15          [-1, 256, 64, 64]         295,168
      BatchNorm2d-16          [-1, 256, 64, 64]             512
             ReLU-17          [-1, 256, 64, 64]               0
           Conv2d-18          [-1, 256, 64, 64]         590,080
      BatchNorm2d-19          [-1, 256, 64, 64]             512
             ReLU-20          [-1, 256, 64, 64]               0
           Conv2d-21          [-1, 256, 64, 64]         590,080
      BatchNorm2d-22          [-1, 256, 64, 64]             512
             ReLU-23          [-1, 256, 64, 64]               0
        MaxPool2d-24          [-1, 256, 32, 32]               0
           Conv2d-25          [-1, 512, 32, 32]       1,180,160
      BatchNorm2d-26          [-1, 512, 32, 32]           1,024
             ReLU-27          [-1, 512, 32, 32]               0
           Conv2d-28          [-1, 512, 32, 32]       2,359,808
      BatchNorm2d-29          [-1, 512, 32, 32]           1,024
             ReLU-30          [-1, 512, 32, 32]               0
           Conv2d-31          [-1, 512, 32, 32]       2,359,808
      BatchNorm2d-32          [-1, 512, 32, 32]           1,024
             ReLU-33          [-1, 512, 32, 32]               0
        MaxPool2d-34          [-1, 512, 16, 16]               0
           Conv2d-35          [-1, 512, 16, 16]       2,359,808
      BatchNorm2d-36          [-1, 512, 16, 16]           1,024
             ReLU-37          [-1, 512, 16, 16]               0
           Conv2d-38          [-1, 512, 16, 16]       2,359,808
      BatchNorm2d-39          [-1, 512, 16, 16]           1,024
             ReLU-40          [-1, 512, 16, 16]               0
           Conv2d-41          [-1, 512, 16, 16]       2,359,808
      BatchNorm2d-42          [-1, 512, 16, 16]           1,024
             ReLU-43          [-1, 512, 16, 16]               0
        MaxPool2d-44            [-1, 512, 8, 8]               0
AdaptiveAvgPool2d-45            [-1, 512, 7, 7]               0
           Linear-46                 [-1, 4096]     102,764,544
             ReLU-47                 [-1, 4096]               0
          Dropout-48                 [-1, 4096]               0
           Linear-49                 [-1, 4096]      16,781,312
             ReLU-50                 [-1, 4096]               0
          Dropout-51                 [-1, 4096]               0
           Linear-52                 [-1, 1000]       4,097,000
================================================================
Total params: 138,365,992
Trainable params: 138,365,992
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.75
Forward/backward pass size (MB): 420.64
Params size (MB): 527.82
Estimated Total Size (MB): 949.21
----------------------------------------------------------------

ResNet

Similar to the VGGNet-16 examples, ResNet models are also made available through torchvision with pre-trained weights from ImageNet. Below, I am instantiating an instance of the ResNet-34 architecture with pre-trained weights. Many models have an expected number of channels and height and width of provided images. So, it is common to need to perform pre-processing, as you will see below. This ResNet architecture expects 3-band, RGB images with spatial dimensions of 224x224. In contrast, the VGGNet-16 architecture above expects spatial dimensions of 256x256.

Printing a summary for the model, you can see that the ResNet-34 architecture has fewer overall parameters and trainable parameters in comparison to VGGNet-16.

modelRN34 = torchvision.models.resnet34(pretrained=True).to(device)
C:\Users\vidcg\ANACON~1\envs\torchENV\lib\site-packages\torchvision\models\_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=ResNet34_Weights.IMAGENET1K_V1`. You can also use `weights=ResNet34_Weights.DEFAULT` to get the most up-to-date weights.
  warnings.warn(msg)
summary(modelRN34, (3, 224, 224))
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 112, 112]           9,408
       BatchNorm2d-2         [-1, 64, 112, 112]             128
              ReLU-3         [-1, 64, 112, 112]               0
         MaxPool2d-4           [-1, 64, 56, 56]               0
            Conv2d-5           [-1, 64, 56, 56]          36,864
       BatchNorm2d-6           [-1, 64, 56, 56]             128
              ReLU-7           [-1, 64, 56, 56]               0
            Conv2d-8           [-1, 64, 56, 56]          36,864
       BatchNorm2d-9           [-1, 64, 56, 56]             128
             ReLU-10           [-1, 64, 56, 56]               0
       BasicBlock-11           [-1, 64, 56, 56]               0
           Conv2d-12           [-1, 64, 56, 56]          36,864
      BatchNorm2d-13           [-1, 64, 56, 56]             128
             ReLU-14           [-1, 64, 56, 56]               0
           Conv2d-15           [-1, 64, 56, 56]          36,864
      BatchNorm2d-16           [-1, 64, 56, 56]             128
             ReLU-17           [-1, 64, 56, 56]               0
       BasicBlock-18           [-1, 64, 56, 56]               0
           Conv2d-19           [-1, 64, 56, 56]          36,864
      BatchNorm2d-20           [-1, 64, 56, 56]             128
             ReLU-21           [-1, 64, 56, 56]               0
           Conv2d-22           [-1, 64, 56, 56]          36,864
      BatchNorm2d-23           [-1, 64, 56, 56]             128
             ReLU-24           [-1, 64, 56, 56]               0
       BasicBlock-25           [-1, 64, 56, 56]               0
           Conv2d-26          [-1, 128, 28, 28]          73,728
      BatchNorm2d-27          [-1, 128, 28, 28]             256
             ReLU-28          [-1, 128, 28, 28]               0
           Conv2d-29          [-1, 128, 28, 28]         147,456
      BatchNorm2d-30          [-1, 128, 28, 28]             256
           Conv2d-31          [-1, 128, 28, 28]           8,192
      BatchNorm2d-32          [-1, 128, 28, 28]             256
             ReLU-33          [-1, 128, 28, 28]               0
       BasicBlock-34          [-1, 128, 28, 28]               0
           Conv2d-35          [-1, 128, 28, 28]         147,456
      BatchNorm2d-36          [-1, 128, 28, 28]             256
             ReLU-37          [-1, 128, 28, 28]               0
           Conv2d-38          [-1, 128, 28, 28]         147,456
      BatchNorm2d-39          [-1, 128, 28, 28]             256
             ReLU-40          [-1, 128, 28, 28]               0
       BasicBlock-41          [-1, 128, 28, 28]               0
           Conv2d-42          [-1, 128, 28, 28]         147,456
      BatchNorm2d-43          [-1, 128, 28, 28]             256
             ReLU-44          [-1, 128, 28, 28]               0
           Conv2d-45          [-1, 128, 28, 28]         147,456
      BatchNorm2d-46          [-1, 128, 28, 28]             256
             ReLU-47          [-1, 128, 28, 28]               0
       BasicBlock-48          [-1, 128, 28, 28]               0
           Conv2d-49          [-1, 128, 28, 28]         147,456
      BatchNorm2d-50          [-1, 128, 28, 28]             256
             ReLU-51          [-1, 128, 28, 28]               0
           Conv2d-52          [-1, 128, 28, 28]         147,456
      BatchNorm2d-53          [-1, 128, 28, 28]             256
             ReLU-54          [-1, 128, 28, 28]               0
       BasicBlock-55          [-1, 128, 28, 28]               0
           Conv2d-56          [-1, 256, 14, 14]         294,912
      BatchNorm2d-57          [-1, 256, 14, 14]             512
             ReLU-58          [-1, 256, 14, 14]               0
           Conv2d-59          [-1, 256, 14, 14]         589,824
      BatchNorm2d-60          [-1, 256, 14, 14]             512
           Conv2d-61          [-1, 256, 14, 14]          32,768
      BatchNorm2d-62          [-1, 256, 14, 14]             512
             ReLU-63          [-1, 256, 14, 14]               0
       BasicBlock-64          [-1, 256, 14, 14]               0
           Conv2d-65          [-1, 256, 14, 14]         589,824
      BatchNorm2d-66          [-1, 256, 14, 14]             512
             ReLU-67          [-1, 256, 14, 14]               0
           Conv2d-68          [-1, 256, 14, 14]         589,824
      BatchNorm2d-69          [-1, 256, 14, 14]             512
             ReLU-70          [-1, 256, 14, 14]               0
       BasicBlock-71          [-1, 256, 14, 14]               0
           Conv2d-72          [-1, 256, 14, 14]         589,824
      BatchNorm2d-73          [-1, 256, 14, 14]             512
             ReLU-74          [-1, 256, 14, 14]               0
           Conv2d-75          [-1, 256, 14, 14]         589,824
      BatchNorm2d-76          [-1, 256, 14, 14]             512
             ReLU-77          [-1, 256, 14, 14]               0
       BasicBlock-78          [-1, 256, 14, 14]               0
           Conv2d-79          [-1, 256, 14, 14]         589,824
      BatchNorm2d-80          [-1, 256, 14, 14]             512
             ReLU-81          [-1, 256, 14, 14]               0
           Conv2d-82          [-1, 256, 14, 14]         589,824
      BatchNorm2d-83          [-1, 256, 14, 14]             512
             ReLU-84          [-1, 256, 14, 14]               0
       BasicBlock-85          [-1, 256, 14, 14]               0
           Conv2d-86          [-1, 256, 14, 14]         589,824
      BatchNorm2d-87          [-1, 256, 14, 14]             512
             ReLU-88          [-1, 256, 14, 14]               0
           Conv2d-89          [-1, 256, 14, 14]         589,824
      BatchNorm2d-90          [-1, 256, 14, 14]             512
             ReLU-91          [-1, 256, 14, 14]               0
       BasicBlock-92          [-1, 256, 14, 14]               0
           Conv2d-93          [-1, 256, 14, 14]         589,824
      BatchNorm2d-94          [-1, 256, 14, 14]             512
             ReLU-95          [-1, 256, 14, 14]               0
           Conv2d-96          [-1, 256, 14, 14]         589,824
      BatchNorm2d-97          [-1, 256, 14, 14]             512
             ReLU-98          [-1, 256, 14, 14]               0
       BasicBlock-99          [-1, 256, 14, 14]               0
          Conv2d-100            [-1, 512, 7, 7]       1,179,648
     BatchNorm2d-101            [-1, 512, 7, 7]           1,024
            ReLU-102            [-1, 512, 7, 7]               0
          Conv2d-103            [-1, 512, 7, 7]       2,359,296
     BatchNorm2d-104            [-1, 512, 7, 7]           1,024
          Conv2d-105            [-1, 512, 7, 7]         131,072
     BatchNorm2d-106            [-1, 512, 7, 7]           1,024
            ReLU-107            [-1, 512, 7, 7]               0
      BasicBlock-108            [-1, 512, 7, 7]               0
          Conv2d-109            [-1, 512, 7, 7]       2,359,296
     BatchNorm2d-110            [-1, 512, 7, 7]           1,024
            ReLU-111            [-1, 512, 7, 7]               0
          Conv2d-112            [-1, 512, 7, 7]       2,359,296
     BatchNorm2d-113            [-1, 512, 7, 7]           1,024
            ReLU-114            [-1, 512, 7, 7]               0
      BasicBlock-115            [-1, 512, 7, 7]               0
          Conv2d-116            [-1, 512, 7, 7]       2,359,296
     BatchNorm2d-117            [-1, 512, 7, 7]           1,024
            ReLU-118            [-1, 512, 7, 7]               0
          Conv2d-119            [-1, 512, 7, 7]       2,359,296
     BatchNorm2d-120            [-1, 512, 7, 7]           1,024
            ReLU-121            [-1, 512, 7, 7]               0
      BasicBlock-122            [-1, 512, 7, 7]               0
AdaptiveAvgPool2d-123            [-1, 512, 1, 1]               0
          Linear-124                 [-1, 1000]         513,000
================================================================
Total params: 21,797,672
Trainable params: 21,797,672
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 96.29
Params size (MB): 83.15
Estimated Total Size (MB): 180.01
----------------------------------------------------------------

By default, all trainable parameters can be updated if the model is trained further using new data. In other words, the computational graph and gradients will be maintained. However, it is possible to freeze parameters so that they cannot be updated. Below, I am defining a function, which I modified from the website included in the comment, which will freeze all model parameters if the freeze parameter is set to True. If I run this function on the ResNet-34 model instance then print a summary, you can see that now none of the parameters are trainable.

It is generally more common to freeze only some of the initially trainable parameters in a model as opposed to all of them. I will demonstrate this next and also in some of the later semantic segmentation modules.

#https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html
def set_parameter_requires_grad(model, freeze=True):
    if freeze == True:
        for param in model.parameters():
            param.requires_grad = False
set_parameter_requires_grad(modelRN34, freeze=True)
summary(modelRN34, (3, 224, 224))
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 112, 112]           9,408
       BatchNorm2d-2         [-1, 64, 112, 112]             128
              ReLU-3         [-1, 64, 112, 112]               0
         MaxPool2d-4           [-1, 64, 56, 56]               0
            Conv2d-5           [-1, 64, 56, 56]          36,864
       BatchNorm2d-6           [-1, 64, 56, 56]             128
              ReLU-7           [-1, 64, 56, 56]               0
            Conv2d-8           [-1, 64, 56, 56]          36,864
       BatchNorm2d-9           [-1, 64, 56, 56]             128
             ReLU-10           [-1, 64, 56, 56]               0
       BasicBlock-11           [-1, 64, 56, 56]               0
           Conv2d-12           [-1, 64, 56, 56]          36,864
      BatchNorm2d-13           [-1, 64, 56, 56]             128
             ReLU-14           [-1, 64, 56, 56]               0
           Conv2d-15           [-1, 64, 56, 56]          36,864
      BatchNorm2d-16           [-1, 64, 56, 56]             128
             ReLU-17           [-1, 64, 56, 56]               0
       BasicBlock-18           [-1, 64, 56, 56]               0
           Conv2d-19           [-1, 64, 56, 56]          36,864
      BatchNorm2d-20           [-1, 64, 56, 56]             128
             ReLU-21           [-1, 64, 56, 56]               0
           Conv2d-22           [-1, 64, 56, 56]          36,864
      BatchNorm2d-23           [-1, 64, 56, 56]             128
             ReLU-24           [-1, 64, 56, 56]               0
       BasicBlock-25           [-1, 64, 56, 56]               0
           Conv2d-26          [-1, 128, 28, 28]          73,728
      BatchNorm2d-27          [-1, 128, 28, 28]             256
             ReLU-28          [-1, 128, 28, 28]               0
           Conv2d-29          [-1, 128, 28, 28]         147,456
      BatchNorm2d-30          [-1, 128, 28, 28]             256
           Conv2d-31          [-1, 128, 28, 28]           8,192
      BatchNorm2d-32          [-1, 128, 28, 28]             256
             ReLU-33          [-1, 128, 28, 28]               0
       BasicBlock-34          [-1, 128, 28, 28]               0
           Conv2d-35          [-1, 128, 28, 28]         147,456
      BatchNorm2d-36          [-1, 128, 28, 28]             256
             ReLU-37          [-1, 128, 28, 28]               0
           Conv2d-38          [-1, 128, 28, 28]         147,456
      BatchNorm2d-39          [-1, 128, 28, 28]             256
             ReLU-40          [-1, 128, 28, 28]               0
       BasicBlock-41          [-1, 128, 28, 28]               0
           Conv2d-42          [-1, 128, 28, 28]         147,456
      BatchNorm2d-43          [-1, 128, 28, 28]             256
             ReLU-44          [-1, 128, 28, 28]               0
           Conv2d-45          [-1, 128, 28, 28]         147,456
      BatchNorm2d-46          [-1, 128, 28, 28]             256
             ReLU-47          [-1, 128, 28, 28]               0
       BasicBlock-48          [-1, 128, 28, 28]               0
           Conv2d-49          [-1, 128, 28, 28]         147,456
      BatchNorm2d-50          [-1, 128, 28, 28]             256
             ReLU-51          [-1, 128, 28, 28]               0
           Conv2d-52          [-1, 128, 28, 28]         147,456
      BatchNorm2d-53          [-1, 128, 28, 28]             256
             ReLU-54          [-1, 128, 28, 28]               0
       BasicBlock-55          [-1, 128, 28, 28]               0
           Conv2d-56          [-1, 256, 14, 14]         294,912
      BatchNorm2d-57          [-1, 256, 14, 14]             512
             ReLU-58          [-1, 256, 14, 14]               0
           Conv2d-59          [-1, 256, 14, 14]         589,824
      BatchNorm2d-60          [-1, 256, 14, 14]             512
           Conv2d-61          [-1, 256, 14, 14]          32,768
      BatchNorm2d-62          [-1, 256, 14, 14]             512
             ReLU-63          [-1, 256, 14, 14]               0
       BasicBlock-64          [-1, 256, 14, 14]               0
           Conv2d-65          [-1, 256, 14, 14]         589,824
      BatchNorm2d-66          [-1, 256, 14, 14]             512
             ReLU-67          [-1, 256, 14, 14]               0
           Conv2d-68          [-1, 256, 14, 14]         589,824
      BatchNorm2d-69          [-1, 256, 14, 14]             512
             ReLU-70          [-1, 256, 14, 14]               0
       BasicBlock-71          [-1, 256, 14, 14]               0
           Conv2d-72          [-1, 256, 14, 14]         589,824
      BatchNorm2d-73          [-1, 256, 14, 14]             512
             ReLU-74          [-1, 256, 14, 14]               0
           Conv2d-75          [-1, 256, 14, 14]         589,824
      BatchNorm2d-76          [-1, 256, 14, 14]             512
             ReLU-77          [-1, 256, 14, 14]               0
       BasicBlock-78          [-1, 256, 14, 14]               0
           Conv2d-79          [-1, 256, 14, 14]         589,824
      BatchNorm2d-80          [-1, 256, 14, 14]             512
             ReLU-81          [-1, 256, 14, 14]               0
           Conv2d-82          [-1, 256, 14, 14]         589,824
      BatchNorm2d-83          [-1, 256, 14, 14]             512
             ReLU-84          [-1, 256, 14, 14]               0
       BasicBlock-85          [-1, 256, 14, 14]               0
           Conv2d-86          [-1, 256, 14, 14]         589,824
      BatchNorm2d-87          [-1, 256, 14, 14]             512
             ReLU-88          [-1, 256, 14, 14]               0
           Conv2d-89          [-1, 256, 14, 14]         589,824
      BatchNorm2d-90          [-1, 256, 14, 14]             512
             ReLU-91          [-1, 256, 14, 14]               0
       BasicBlock-92          [-1, 256, 14, 14]               0
           Conv2d-93          [-1, 256, 14, 14]         589,824
      BatchNorm2d-94          [-1, 256, 14, 14]             512
             ReLU-95          [-1, 256, 14, 14]               0
           Conv2d-96          [-1, 256, 14, 14]         589,824
      BatchNorm2d-97          [-1, 256, 14, 14]             512
             ReLU-98          [-1, 256, 14, 14]               0
       BasicBlock-99          [-1, 256, 14, 14]               0
          Conv2d-100            [-1, 512, 7, 7]       1,179,648
     BatchNorm2d-101            [-1, 512, 7, 7]           1,024
            ReLU-102            [-1, 512, 7, 7]               0
          Conv2d-103            [-1, 512, 7, 7]       2,359,296
     BatchNorm2d-104            [-1, 512, 7, 7]           1,024
          Conv2d-105            [-1, 512, 7, 7]         131,072
     BatchNorm2d-106            [-1, 512, 7, 7]           1,024
            ReLU-107            [-1, 512, 7, 7]               0
      BasicBlock-108            [-1, 512, 7, 7]               0
          Conv2d-109            [-1, 512, 7, 7]       2,359,296
     BatchNorm2d-110            [-1, 512, 7, 7]           1,024
            ReLU-111            [-1, 512, 7, 7]               0
          Conv2d-112            [-1, 512, 7, 7]       2,359,296
     BatchNorm2d-113            [-1, 512, 7, 7]           1,024
            ReLU-114            [-1, 512, 7, 7]               0
      BasicBlock-115            [-1, 512, 7, 7]               0
          Conv2d-116            [-1, 512, 7, 7]       2,359,296
     BatchNorm2d-117            [-1, 512, 7, 7]           1,024
            ReLU-118            [-1, 512, 7, 7]               0
          Conv2d-119            [-1, 512, 7, 7]       2,359,296
     BatchNorm2d-120            [-1, 512, 7, 7]           1,024
            ReLU-121            [-1, 512, 7, 7]               0
      BasicBlock-122            [-1, 512, 7, 7]               0
AdaptiveAvgPool2d-123            [-1, 512, 1, 1]               0
          Linear-124                 [-1, 1000]         513,000
================================================================
Total params: 21,797,672
Trainable params: 0
Non-trainable params: 21,797,672
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 96.29
Params size (MB): 83.15
Estimated Total Size (MB): 180.01
----------------------------------------------------------------

I will now build a function that allows the user to specify which ResNet architecture is desired (“18”, “34”, “50”, “101”, or “152”), the number of input channels, the number of classes being differentiated, whether or not to freeze the parameters associated with the convolutional component of the model, and whether or not to use pre-trained weights. One common issue with pre-trained models and pre-defined architectures is that they expect a certain input image size. This is a result of the flattening and fully connected components of the model. If the size of the spatial dimensions are not consistent with the original data used to train the model then the length of the flattened tensor following the convolutional operations will also not be consistent. This is because the length of the flattened array is Number of Final Feature Maps X Height X Width. So, one common means to generalize these pre-trained models to allow for images of a different input size to be provided and/or for the model to be trained for a new task, is to redefine the fully connected component of the model but maintain the convolutional component. The weights from the convolutional component can then be used while new weights will be learned for the newly defined fully connected components. It is possible to either (1) not initialize the parameters associated with the convolutional component using pre-trained weights, (2) initialize the models using pre-trained weights but allow them to be further trained or updated, or (3) lock or freeze the parameters associated with the convolutional component while updating those associated with the fully connected component.

Note that one component that makes the ResNet architecture used here easier to modify is the use of nn.AdaptiveAvgPool2d() at the end of the convolutional component of the architecture. This allows for a fixed output size to be fed to the fully connected component of the model. Or, the model can more easily be generalized to variable input sizes. However, the number of outputs from the fully connected layer still needs to be modified to match the number of classes being differentiated.

Let’s step through this function. Which ResNet architecture is returned depends on the user’s input to the ResNet argument. Within the first series of control flow statements, the desired architecture is initialized. The pretrained parameter determines whether or not pre-trained weights will be used. The default is True. If pre-trained weights are desired, then the freeze parameter can be used to specify if the parameters associated with the convolutional component of the architecture will be trainable or not. This is accomplished using the set_parameter_requires_grad() function defined above. If the freeze parameter is set to True, all weights will be frozen. If it is set to False, they will remain trainable. If freeze is set to True, then all the model parameters will be frozen, not just those associated with the convolutional component. To unfreeze the parameters associated with the fully connected component at the end of the architecture, I then replace this component with a new nn.Linear() layer with the correct number of output classes, which can be defined by the user.

In order to generalize this model further, I also allow the user to specify the number of input channels. This requires replacing the first 2D convolution layer, which expects three channels. This will also require that the weights be updated for this convolutional layer during the training process. I also replace the batch normalization layer following this first convolutional layer so that the associated parameters can also be updated.

The order here matters. The first 2D convolution and batch normalization layers and the fully connected layer at the end of the architecture must be replaced after freezing the model parameters/weights since these layers should be trainable. Replacing them will unfreeze the weights since this is the default state.

# https://stackoverflow.com/questions/62629114/how-to-modify-resnet-50-with-4-channels-as-input-using-pre-trained-weights-in-py
# https://discuss.pytorch.org/t/transfer-learning-usage-with-different-input-size/20744

def initialize_model(resNet, nChn, nCls, freeze=True, pretrained=True):
  if resNet == "18":
    model = torchvision.models.resnet18(pretrained=pretrained)

  elif resNet == "34":
    model = torchvision.models.resnet34(pretrained=pretrained)

  elif resNet == "50":
    model = torchvision.models.resnet50(pretrained=pretrained)

  elif resNet == "101":
    model = torchvision.models.resnet101(pretrained=pretrained)

  elif resNet == "152":
    model = torchvision.models.resnet152(pretrained=pretrained)

  else:
    model = torchvision.models.resnet34(pretrained=pretrained)
  
  if pretrained == True:
    set_parameter_requires_grad(model, freeze)
    num_ftrs = model.fc.in_features
    model.fc = nn.Linear(num_ftrs, nCls)
  
  if nChn != 3:
      model.conv1 = nn.Conv2d(nChn, 64, kernel_size=7, stride=2, padding=3, bias=False)
      model.bn1 = nn.BatchNorm2d(64)
  
  return model

I next instantiate an instance of the model with a ResNet-34 architecture, 1 input channel (i.e., the network will accept a grayscale image), 5 classes being differentiated, freezing the convolutional component of the model, and using pre-trained weights. I then print a summary for this model using an input size of (3, 512, 512). Note that some of the parameters are trainable, specifically those associated with the first convolutional layer, first batch normalization layer, and fully connected layer at the end of the model, while some are not trainable, specifically those associated with the convolutional component of the architecture other than the first 2D convolution layer and first batch normalization layer. If you run this model with the freeze parameter set to False, all weights will be trainable. If you run it with the pretrained parameter set to False then the model will initialize with random weights as opposed to pre-trained weights and all weights will be trainable.

Freezing weights can greatly decrease the computational load and time needed to train models. Whether or not suitable output can be generated without training all weights will depend on the use case. It is also possible to freeze and/or unfreeze parameters at specific points in the training process, such as at a defined epoch.

I find manually altering these provided architectures to be tricky. However, you can generally find help online.

model = initialize_model(resNet="34", nChn=1, nCls=5, freeze=True, pretrained=True).to(device)
summary(model, (1, 512, 512))
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 256, 256]           3,136
       BatchNorm2d-2         [-1, 64, 256, 256]             128
              ReLU-3         [-1, 64, 256, 256]               0
         MaxPool2d-4         [-1, 64, 128, 128]               0
            Conv2d-5         [-1, 64, 128, 128]          36,864
       BatchNorm2d-6         [-1, 64, 128, 128]             128
              ReLU-7         [-1, 64, 128, 128]               0
            Conv2d-8         [-1, 64, 128, 128]          36,864
       BatchNorm2d-9         [-1, 64, 128, 128]             128
             ReLU-10         [-1, 64, 128, 128]               0
       BasicBlock-11         [-1, 64, 128, 128]               0
           Conv2d-12         [-1, 64, 128, 128]          36,864
      BatchNorm2d-13         [-1, 64, 128, 128]             128
             ReLU-14         [-1, 64, 128, 128]               0
           Conv2d-15         [-1, 64, 128, 128]          36,864
      BatchNorm2d-16         [-1, 64, 128, 128]             128
             ReLU-17         [-1, 64, 128, 128]               0
       BasicBlock-18         [-1, 64, 128, 128]               0
           Conv2d-19         [-1, 64, 128, 128]          36,864
      BatchNorm2d-20         [-1, 64, 128, 128]             128
             ReLU-21         [-1, 64, 128, 128]               0
           Conv2d-22         [-1, 64, 128, 128]          36,864
      BatchNorm2d-23         [-1, 64, 128, 128]             128
             ReLU-24         [-1, 64, 128, 128]               0
       BasicBlock-25         [-1, 64, 128, 128]               0
           Conv2d-26          [-1, 128, 64, 64]          73,728
      BatchNorm2d-27          [-1, 128, 64, 64]             256
             ReLU-28          [-1, 128, 64, 64]               0
           Conv2d-29          [-1, 128, 64, 64]         147,456
      BatchNorm2d-30          [-1, 128, 64, 64]             256
           Conv2d-31          [-1, 128, 64, 64]           8,192
      BatchNorm2d-32          [-1, 128, 64, 64]             256
             ReLU-33          [-1, 128, 64, 64]               0
       BasicBlock-34          [-1, 128, 64, 64]               0
           Conv2d-35          [-1, 128, 64, 64]         147,456
      BatchNorm2d-36          [-1, 128, 64, 64]             256
             ReLU-37          [-1, 128, 64, 64]               0
           Conv2d-38          [-1, 128, 64, 64]         147,456
      BatchNorm2d-39          [-1, 128, 64, 64]             256
             ReLU-40          [-1, 128, 64, 64]               0
       BasicBlock-41          [-1, 128, 64, 64]               0
           Conv2d-42          [-1, 128, 64, 64]         147,456
      BatchNorm2d-43          [-1, 128, 64, 64]             256
             ReLU-44          [-1, 128, 64, 64]               0
           Conv2d-45          [-1, 128, 64, 64]         147,456
      BatchNorm2d-46          [-1, 128, 64, 64]             256
             ReLU-47          [-1, 128, 64, 64]               0
       BasicBlock-48          [-1, 128, 64, 64]               0
           Conv2d-49          [-1, 128, 64, 64]         147,456
      BatchNorm2d-50          [-1, 128, 64, 64]             256
             ReLU-51          [-1, 128, 64, 64]               0
           Conv2d-52          [-1, 128, 64, 64]         147,456
      BatchNorm2d-53          [-1, 128, 64, 64]             256
             ReLU-54          [-1, 128, 64, 64]               0
       BasicBlock-55          [-1, 128, 64, 64]               0
           Conv2d-56          [-1, 256, 32, 32]         294,912
      BatchNorm2d-57          [-1, 256, 32, 32]             512
             ReLU-58          [-1, 256, 32, 32]               0
           Conv2d-59          [-1, 256, 32, 32]         589,824
      BatchNorm2d-60          [-1, 256, 32, 32]             512
           Conv2d-61          [-1, 256, 32, 32]          32,768
      BatchNorm2d-62          [-1, 256, 32, 32]             512
             ReLU-63          [-1, 256, 32, 32]               0
       BasicBlock-64          [-1, 256, 32, 32]               0
           Conv2d-65          [-1, 256, 32, 32]         589,824
      BatchNorm2d-66          [-1, 256, 32, 32]             512
             ReLU-67          [-1, 256, 32, 32]               0
           Conv2d-68          [-1, 256, 32, 32]         589,824
      BatchNorm2d-69          [-1, 256, 32, 32]             512
             ReLU-70          [-1, 256, 32, 32]               0
       BasicBlock-71          [-1, 256, 32, 32]               0
           Conv2d-72          [-1, 256, 32, 32]         589,824
      BatchNorm2d-73          [-1, 256, 32, 32]             512
             ReLU-74          [-1, 256, 32, 32]               0
           Conv2d-75          [-1, 256, 32, 32]         589,824
      BatchNorm2d-76          [-1, 256, 32, 32]             512
             ReLU-77          [-1, 256, 32, 32]               0
       BasicBlock-78          [-1, 256, 32, 32]               0
           Conv2d-79          [-1, 256, 32, 32]         589,824
      BatchNorm2d-80          [-1, 256, 32, 32]             512
             ReLU-81          [-1, 256, 32, 32]               0
           Conv2d-82          [-1, 256, 32, 32]         589,824
      BatchNorm2d-83          [-1, 256, 32, 32]             512
             ReLU-84          [-1, 256, 32, 32]               0
       BasicBlock-85          [-1, 256, 32, 32]               0
           Conv2d-86          [-1, 256, 32, 32]         589,824
      BatchNorm2d-87          [-1, 256, 32, 32]             512
             ReLU-88          [-1, 256, 32, 32]               0
           Conv2d-89          [-1, 256, 32, 32]         589,824
      BatchNorm2d-90          [-1, 256, 32, 32]             512
             ReLU-91          [-1, 256, 32, 32]               0
       BasicBlock-92          [-1, 256, 32, 32]               0
           Conv2d-93          [-1, 256, 32, 32]         589,824
      BatchNorm2d-94          [-1, 256, 32, 32]             512
             ReLU-95          [-1, 256, 32, 32]               0
           Conv2d-96          [-1, 256, 32, 32]         589,824
      BatchNorm2d-97          [-1, 256, 32, 32]             512
             ReLU-98          [-1, 256, 32, 32]               0
       BasicBlock-99          [-1, 256, 32, 32]               0
          Conv2d-100          [-1, 512, 16, 16]       1,179,648
     BatchNorm2d-101          [-1, 512, 16, 16]           1,024
            ReLU-102          [-1, 512, 16, 16]               0
          Conv2d-103          [-1, 512, 16, 16]       2,359,296
     BatchNorm2d-104          [-1, 512, 16, 16]           1,024
          Conv2d-105          [-1, 512, 16, 16]         131,072
     BatchNorm2d-106          [-1, 512, 16, 16]           1,024
            ReLU-107          [-1, 512, 16, 16]               0
      BasicBlock-108          [-1, 512, 16, 16]               0
          Conv2d-109          [-1, 512, 16, 16]       2,359,296
     BatchNorm2d-110          [-1, 512, 16, 16]           1,024
            ReLU-111          [-1, 512, 16, 16]               0
          Conv2d-112          [-1, 512, 16, 16]       2,359,296
     BatchNorm2d-113          [-1, 512, 16, 16]           1,024
            ReLU-114          [-1, 512, 16, 16]               0
      BasicBlock-115          [-1, 512, 16, 16]               0
          Conv2d-116          [-1, 512, 16, 16]       2,359,296
     BatchNorm2d-117          [-1, 512, 16, 16]           1,024
            ReLU-118          [-1, 512, 16, 16]               0
          Conv2d-119          [-1, 512, 16, 16]       2,359,296
     BatchNorm2d-120          [-1, 512, 16, 16]           1,024
            ReLU-121          [-1, 512, 16, 16]               0
      BasicBlock-122          [-1, 512, 16, 16]               0
AdaptiveAvgPool2d-123            [-1, 512, 1, 1]               0
          Linear-124                    [-1, 5]           2,565
================================================================
Total params: 21,280,965
Trainable params: 5,829
Non-trainable params: 21,275,136
----------------------------------------------------------------
Input size (MB): 1.00
Forward/backward pass size (MB): 503.00
Params size (MB): 81.18
Estimated Total Size (MB): 585.18
----------------------------------------------------------------

Predict Using a Pretrained Model

Let’s now use a pre-trained network to predict the label for an input image. Here, I will use a picture of my cat Peri. The path to the file is defined and the image is read in using PIL. I then plot the image using imshow() from matplotlib.

imgPth = "C:/myFiles/work/dl/input_data/peri.jpg"
img1 = Image.open(imgPth)
plt.imshow(img1)

Since I will use a pre-trained model and not make any modifications to the architecture, as I did above, the image will need to be transformed to the anticipated shape. This is accomplished using transforms as define by torchvision. I also normalize the bands relative to the band means and standard deviations of ImageNet, the dataset used to develop the pre-trained weights being used. This is generally required when using pre-trained weights.

I then apply the defined transforms to the image, which includes transforming it to a torch tensor. The unsqueeze() function is used to add a dimension at index 0 since the model expects a batch dimension. The image is then moved to the device.

transform = transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
imgT = transform(img1)
imgB = torch.unsqueeze(imgT, 0).to(device)

I instantiate an instance of a ResNet-34 model with pre-trained weights and move it to the device.

model = torchvision.models.resnet34(pretrained=True).to(device)

The model is also placed in evaluation mode using the eval() method for the model. This is important so that using the model does not impact the computational graph or gradients. I next predict the image using the model.

I read in a mapping of the class codes and associated class labels that was obtained at the commented out link. ImageNet differentiates a total of 1,000 classes. Using the results and this mapping, I then print the class with the highest predicted probability followed by the top 5 classes.

The model predicted that the image was of a “Siamese cat” with a 99.8% probability. This is pretty impressive. The next two highest rankings are two other types of cats followed by “paper towel” and “sleeping bag”.

model.eval()
ResNet(
  (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
  (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU(inplace=True)
  (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
  (layer1): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (1): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (2): BasicBlock(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer2): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (2): BasicBlock(
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (3): BasicBlock(
      (conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer3): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (2): BasicBlock(
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (3): BasicBlock(
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (4): BasicBlock(
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (5): BasicBlock(
      (conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (layer4): Sequential(
    (0): BasicBlock(
      (conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (downsample): Sequential(
        (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
        (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      )
    )
    (1): BasicBlock(
      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
    (2): BasicBlock(
      (conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
      (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
      (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    )
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
  (fc): Linear(in_features=512, out_features=1000, bias=True)
)
prediction = model(imgB)
#https://learnopencv.com/pytorch-for-beginners-image-classification-using-pre-trained-models/
with open('C:/myFiles/work/dl/input_data/imagenet1000_clsidx_to_labels.txt') as f:
  classes = [line.strip() for line in f.readlines()]
_, index = torch.max(prediction, 1)
percentage = torch.nn.functional.softmax(prediction, dim=1)[0] * 100
print(classes[index[0]], percentage[index[0]].item())
284: 'Siamese cat, Siamese', 99.83098602294922
_, indices = torch.sort(prediction, descending=True)
[(classes[idx], percentage[idx].item()) for idx in indices[0][:5]]
[("284: 'Siamese cat, Siamese',", 99.83098602294922), ("287: 'lynx, catamount',", 0.04350800812244415), ("285: 'Egyptian cat',", 0.03190483897924423), ("700: 'paper towel',", 0.006886311341077089), ("797: 'sleeping bag',", 0.006694042589515448)]

Display Feature Maps

The next set of code was obtained from the following Medium post:

https://ravivaishnav20.medium.com/visualizing-feature-maps-using-pytorch-12a48cd1e573

The goal of this code is to generate visualizations of the feature maps. Remember that feature maps represent the learned kernels applied to the input data or prior feature maps from the prior layers. We will specifically make use of the results from the ResNet-32 model applied to the picture of Peri that were obtained above.

This code does the following:

  1. Extract and store all of the nn.Conv2d() layers in a list and all of the associated kernel weights in a second list
  2. Apply the kernels to the input image to obtain feature maps and save the outputs and associated names to lists
  3. Sum across channels
  4. Generate a plot of learned feature maps

It is interesting to explore the learned feature maps to determine what features of the image are being used to make predictions at the varying levels of the architecture. It is also interesting to see the impact of max pooling on the spatial dimensions of the array.

I encourage you to read the Medium post referenced here for a more detailed discussion of this code.

#https://ravivaishnav20.medium.com/visualizing-feature-maps-using-pytorch-12a48cd1e573
# we will save the conv layer weights in this list
model_weights =[]
#we will save the 49 conv layers in this list
conv_layers = []
# get all the model children as list
model_children = list(model.children())
#counter to keep count of the conv layers
counter = 0
#append all the conv layers and their respective wights to the list
for i in range(len(model_children)):
    if type(model_children[i]) == nn.Conv2d:
        counter+=1
        model_weights.append(model_children[i].weight)
        conv_layers.append(model_children[i])
    elif type(model_children[i]) == nn.Sequential:
        for j in range(len(model_children[i])):
            for child in model_children[i][j].children():
                if type(child) == nn.Conv2d:
                    counter+=1
                    model_weights.append(child.weight)
                    conv_layers.append(child)
print(f"Total convolution layers: {counter}")
Total convolution layers: 33
print("conv_layers")
conv_layers
image = imgB
outputs = []
names = []
for layer in conv_layers[0:]:
    image = layer(image)
    outputs.append(image)
    names.append(str(layer))
print(len(outputs))
#print feature_maps
33
for feature_map in outputs:
    print(feature_map.shape)
torch.Size([1, 64, 112, 112])
torch.Size([1, 64, 112, 112])
torch.Size([1, 64, 112, 112])
torch.Size([1, 64, 112, 112])
torch.Size([1, 64, 112, 112])
torch.Size([1, 64, 112, 112])
torch.Size([1, 64, 112, 112])
torch.Size([1, 128, 56, 56])
torch.Size([1, 128, 56, 56])
torch.Size([1, 128, 56, 56])
torch.Size([1, 128, 56, 56])
torch.Size([1, 128, 56, 56])
torch.Size([1, 128, 56, 56])
torch.Size([1, 128, 56, 56])
torch.Size([1, 128, 56, 56])
torch.Size([1, 256, 28, 28])
torch.Size([1, 256, 28, 28])
torch.Size([1, 256, 28, 28])
torch.Size([1, 256, 28, 28])
torch.Size([1, 256, 28, 28])
torch.Size([1, 256, 28, 28])
torch.Size([1, 256, 28, 28])
torch.Size([1, 256, 28, 28])
torch.Size([1, 256, 28, 28])
torch.Size([1, 256, 28, 28])
torch.Size([1, 256, 28, 28])
torch.Size([1, 256, 28, 28])
torch.Size([1, 512, 14, 14])
torch.Size([1, 512, 14, 14])
torch.Size([1, 512, 14, 14])
torch.Size([1, 512, 14, 14])
torch.Size([1, 512, 14, 14])
torch.Size([1, 512, 14, 14])
processed = []
for feature_map in outputs:
    feature_map = feature_map.squeeze(0)
    gray_scale = torch.sum(feature_map,0)
    gray_scale = gray_scale / feature_map.shape[0]
    processed.append(gray_scale.data.cpu().numpy())
for fm in processed:
    print(fm.shape)
(112, 112)
(112, 112)
(112, 112)
(112, 112)
(112, 112)
(112, 112)
(112, 112)
(56, 56)
(56, 56)
(56, 56)
(56, 56)
(56, 56)
(56, 56)
(56, 56)
(56, 56)
(28, 28)
(28, 28)
(28, 28)
(28, 28)
(28, 28)
(28, 28)
(28, 28)
(28, 28)
(28, 28)
(28, 28)
(28, 28)
(28, 28)
(14, 14)
(14, 14)
(14, 14)
(14, 14)
(14, 14)
(14, 14)
fig = plt.figure(figsize=(30, 50))
for i in range(len(processed)):
    a = fig.add_subplot(10, 4, i+1)
    imgplot = plt.imshow(processed[i])
    a.axis("off")
    a.set_title(names[i].split('(')[0], fontsize=30)
(-0.5, 111.5, 111.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 111.5, 111.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 111.5, 111.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 111.5, 111.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 111.5, 111.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 111.5, 111.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 111.5, 111.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 55.5, 55.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 55.5, 55.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 55.5, 55.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 55.5, 55.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 55.5, 55.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 55.5, 55.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 55.5, 55.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 55.5, 55.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 27.5, 27.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 27.5, 27.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 27.5, 27.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 27.5, 27.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 27.5, 27.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 27.5, 27.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 27.5, 27.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 27.5, 27.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 27.5, 27.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 27.5, 27.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 27.5, 27.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 27.5, 27.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 13.5, 13.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 13.5, 13.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 13.5, 13.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 13.5, 13.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 13.5, 13.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
(-0.5, 13.5, 13.5, -0.5)
Text(0.5, 1.0, 'Conv2d')
plt.show(fig)

Concluding Remarks

We are now ready to move on to the last section focused on using CNNs for scene labeling tasks. In this section, we will combine what was discussed and demonstrated in the prior CNN modules to train a scene classification model using transfer learning and a modified ResNet-34 architecture.