Fully Connected Neural Network

Introduction to PyTorch: Fully Connected Neural Networks

Introduction

In this first module, we will dive into PyTorch by building a simple, fully connected neural network. The primary learning objectives are to:

  1. Learn how to subclass the nn.Module class to define a neural network architecture

  2. Set up the __init__() constructor method and define a forward pass

  3. Use nn.Sequential() to simplify your neural network definition

Although this example is pretty simple, you will see that more complex neural network architectures can be defined using the same workflow. We will do so in later modules.

Setup

First, we need to import the required packages.

Importing torch allows us to use PyTorch and the associated methods and classes. Among other functions, PyTorch allows us to define tensors and neural network architectures.

Importing torch.nn from torch allows us to use classes and functions from this subpackage without adding the torch prefix. This is a common shorthand when using PyTorch.

import torch
import torch.nn as nn

PyTorch allows for computation to be performed on central processing units (CPUs) or graphics processing units (GPUs). Although it is not required for this module since we will not be training a model or using a trained model to make inferences, below I am showing how to check whether or not you have access to a GPU. If a GPU is available for computation, it gets assigned to the device variable. If not, then “cpu” gets assigned to the device variable. You can then input the device variable when moving models and/or data to a specific device for computation.

Note that you can use a GPU in Google Colab by going to Edit –> Notebook Settings and then setting hardware acceleration to “GPU”. You can also use a tensor processing unit (TPU). “None” indicates that the CPU will be used for computation. There are limits to the amount of computational time that you can use within Colab, since the computation is occurring on Google’s servers as opposed to your local machine and using your local hardware. So, as you begin to use PyTorch and deep learning more frequently or when you need to analyze large datasets, you may need to set up a local environment with access to a GPU or purchase a virtual machine.

If you are working on a local machine and want to make use of GPU-based computation, you need to have a compatible graphics card installed along with the CUDA Toolkit. You will also need to install the Deep Neural Network library (cuDNN) for CUDA. Both CUDA and cuDNN are free.

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)
cuda:0

Define Network Architecture

We are now ready to build a neural network. A fully connected neural network is defined below. I am creating a neural network called myFCN by subclassing the nn.Module class made available by PyTorch. Within the __init__() constructor method and in order to inherit from the parent or super class, I define super().__init__(). I then define new parameters within the constructor method. Specifically, users should be able to specify the size of the input 1D array (inSize), the sizes of the outputs of each hidden layer (hiddenSizes), and the number of output nodes (outSize). For a classification problem, the output size would need to be equal to the number of classes. For a binary classification, the output size can be 1 or 2. We will discuss this in more detail in later modules. For a regression problem, the output size will be 1.

Next, I define the layers that will make up the neural network. This network will consist of the following process:

Input –> Fully Connected Layer –> Batch Normalization –> Rectified Linear Unit Activation –> Batch Normalization –> Rectified Linear Unit Activation –> Fully Connected Layer

To implement this, I must define the fully connected layers, the batch normalization layers, and a rectified linear unit activation (ReLU) function. nn.Linear() allows for defining a fully connected layer with a define input and output size. The input size needs to be the output size from the previous layer while the output size represents the number of neurons or nodes in the current layer. So, the first fully connected layer will accept an input size equal to the size of the input data. The second fully connected layer will accept an input size equal to the size of the first fully connected layer, and the last fully connected layer requires an input size equal to the output of the second hidden layer. For the last fully connected layer, the output size would need to be set to the number of classes for a classification problem or 1 for a regression problem.

Batch normalization and an activation function are not applied following the last fully connected layer. This is because I want to obtain the logits. Logits can be converted to probabilities using a sigmoid function, in the case of a binary classification with one output node, or a softmax activation for a multiclass classification. Whether or not you want to convert the raw logits to probabilities will depend on what is expected by the loss metric being used. We will explore this in later modules. For now, we will just output the raw logits.

The __init__() constructor method is used to define the parameters associated with the subclass and the components of the model. It does not specify how data will be fed through the network. This is the purpose of the forward() method. The forward method accepts an input (x), which is then passed through the network sequentially. It then returns the result of the network, which in this case is saved to the variable x.

If you are having trouble following the syntax, you may need to review how classes and subclassing is implemented with Python. For example, if you are confused by the use of the self variable, there is a good chance that you need to further investigate classes.

class myFCN(nn.Module):
  def __init__(self, inSize, hiddenSizes, outSize):
    super().__init__()
    self.inSize = inSize
    self.hiddenSize = hiddenSizes
    self.outSize = outSize

    self.lin1 = nn.Linear(inSize, hiddenSizes[0])
    self.lin2 = nn.Linear(hiddenSizes[0], hiddenSizes[1])
    self.lin3 = nn.Linear(hiddenSizes[1], outSize)

    self.bn1 = nn.BatchNorm2d(hiddenSizes[0])
    self.bn2 = nn.BatchNorm2d(hiddenSizes[1])

    self.relu = nn.ReLU()

  def forward(self, x):
    x = self.relu(self.bn1(self.lin1(x)))
    x = self.relu(self.bn2(self.lin2(x)))
    x = self.lin3(x)
    return x

Running the code above creates the subclass from the nn.Module class that inherits the functionality of nn.Module and implements the neural network defined. To instantiate an instance of my new myFCN subclass, I call the class name and provide arguments to the required parameters. Below, I am instantiating an instance of the myFCN nn.Module subclass called model. The defined neural network accepts 10 inputs, has 2 intermediate layers with 256 nodes each, and has a final fully connected layer with 5 nodes, mimicking a classification problem where five classes are differentiated.

Note the use of the to() method, which allows me to move the model to the device, in my case a GPU.

By simply calling the model, its contents will be printed.

model = myFCN(inSize=10, hiddenSizes=[256, 256], outSize=5).to(device)
model
myFCN(
  (lin1): Linear(in_features=10, out_features=256, bias=True)
  (lin2): Linear(in_features=256, out_features=256, bias=True)
  (lin3): Linear(in_features=256, out_features=5, bias=True)
  (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (relu): ReLU()
)

Using nn.Sequential()

Another means to define a neural network architecture is to use nn.Sequential(), which allows you to define a sequence of operations. This is specifically useful when data pass through a network sequentially or linearly.

Below, I have constructed the same network as defined above, but now using nn.Sequential(). All components of the network are now combined within the theNetwork object. Using nn.Sequential() greatly simplifies the forward method. In this case, the input (x) is simply provided to theNetwork and the result is returned.

Either of these methods will yield the same resulting network. Whether or not you want to use nn.Sequential() is generally just personal preference. However, it can simplify defining networks, especially for more complex networks. I generally prefer using nn.Sequential() as I find the code easier to read, especially as networks become larger and more complicated.

Below, I have instantiated the network again using the new definition. Note that using nn.Sequential() does change the printed summary when the model name is called.

class myFCN(nn.Module):
  def __init__(self, inSize, hiddenSizes, outSize):
    super().__init__()
    self.inSize = inSize
    self.hiddenSize = hiddenSizes
    self.outSize = outSize

    self.theNetwork = nn.Sequential(
        nn.Linear(inSize, hiddenSizes[0]),
        nn.BatchNorm1d(hiddenSizes[0]),
        nn.ReLU(inplace=True),
        nn.Linear(hiddenSizes[0], hiddenSizes[1]),
        nn.BatchNorm1d(hiddenSizes[1]),
        nn.ReLU(inplace=True),
        nn.Linear(hiddenSizes[1], outSize)
    )

  def forward(self, x):
    x = self.theNetwork(x)
    retu
model = myFCN(inSize=6, hiddenSizes=[256, 256], outSize=10).to(device)
model
myFCN(
  (theNetwork): Sequential(
    (0): Linear(in_features=6, out_features=256, bias=True)
    (1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU(inplace=True)
    (3): Linear(in_features=256, out_features=256, bias=True)
    (4): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (5): ReLU(inplace=True)
    (6): Linear(in_features=256, out_features=10, bias=True)
  )
)

Concluding Remarks

We now have a very simple network defined, but there is much more to do. We need to read in data, prepare it as input to the neural network, define a loss metric, define some assessment metrics, set up a training loop, train the model, assess the model, and use it to make predictions. We will cover all of these topics in later modules. However, in the next module, we will first explore tensors and tensor operations in more detail.