Test and training data with their associated labels have been acquired from an academic dataset (not available in this repo due to size >700MB). The actual content of these datasets is not entirely certain, but likely to be small image data labelled into several categories. The goal is to build neural network models with PyTorch that classify the data to the labels.
Initially, a simple neural network is built, followed by a convolutional neural network. These are run here on a CPU, but the code is written to run on a GPU where available.
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from tqdm import tqdm
# Load the data
train_data = torch.load('train_data_2')
test_data = torch.load('test_data_2')
train_labels = torch.load('train_labels_2').long()
test_labels = torch.load('test_labels_2').long()
# Check the shape of the data
train_data.shape, test_data.shape
The data appears to be colour images (3 channel) of 32x32 pixels. We can test this by plotting a sample.
import matplotlib.pyplot as plt
sample_num=141
print(f'The Corresponding Label is: {train_labels[sample_num]}')
plt.imshow(train_data[sample_num][0], cmap='nipy_spectral')
plt.show()
This is clearly an image of a horse, despite being unable to find the exact colourmap to a make it look natural. It is worth examining the labels to have an idea of how many categories this data is to be classified into.
# Find the min and max values of the labels
min(train_labels), max(train_labels)
There appear to be 10 output categories for this data, so the neural network should have 10 outputs. Additionally, the data needs to be flattened before use in the neural network.
# Flatten data tensors
train_data = train_data.view(train_data.shape[0], train_data.shape[1] * train_data.shape[2] * train_data.shape[3])
test_data = test_data.view(test_data.shape[0], test_data.shape[1] * test_data.shape[2] * test_data.shape[3])
# Check data shapes and types are correct for PyTorch
print(train_data.shape, test_data.shape)
print(train_data.type(), test_data.type())
print(train_labels.type(), test_labels.type())
# Define the parameters of the simple neural network
class Simple_NN(nn.Module):
def __init__(self, n_features):
super().__init__()
self.fc1 = nn.Linear(n_features, 256)
self.fc2 = nn.Linear(256, 128)
self.fc3 = nn.Linear(128, 64)
self.fc4 = nn.Linear(64, 10)
def forward(self, x):
x = F.relu(self.fc1(x)) # ReLU activation function used for hidden layers
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x))
# Log softmax is an activation function that normalizes the category scores, and the log also increases penalty of incorrect classification
x = F.log_softmax(self.fc4(x), dim = 1)
return x
# Push processing onto GPU if available
if torch.cuda.is_available():
device = torch.device("cuda:0")
print("running on the GPU")
else:
device = torch.device("cpu")
print("running on the CPU")
net = Simple_NN(n_features = 3072)
net.to(device)
# Define loss function and optimizer to use
loss_function = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(net.parameters(), lr=0.1)
epochs = 20
BATCH_SIZE = 100
for epoch in range (epochs):
for i in tqdm(range(0,len(train_data), BATCH_SIZE)): # Use tqdm to show progress bars
# Batch the data
batch_data = train_data[i:i+BATCH_SIZE]
batch_labels = train_labels[i:i+BATCH_SIZE]
batch_data = batch_data.to(device)
batch_labels = batch_labels.to(device)
net.zero_grad() # Set gradients to 0 before loss calculation
output = net(batch_data) # Pass in the reshaped batch
loss = loss_function(output, batch_labels) # Calculate the loss value
loss.backward() # Apply the loss backwards through the network's parameters
optimizer.step() # Optimize weights to account for loss/gradients
print(loss)
# Use the model to predict from the test set
with torch.no_grad():
predicted = net.forward(test_data.to(device))
predicted_classes = torch.argmax(predicted, dim = 1)
# The model seems to predict fairly well but still makes some mistakes
print(predicted_classes[0:10])
print(test_labels[0:10])
# Push data back onto CPU for further analysis
device = torch.device('cpu')
predicted_classes = predicted_classes.to(device)
test_labels = test_labels.to(device)
# Find accuracy metric for prediction on test set
correct = 0
total = 0
for i in range(len(predicted_classes)):
if predicted_classes[i] == test_labels[i]:
correct += 1
total += 1
print("Accuracy: ", round(correct/total, 3))
This model has given 50% accuracy. This can be compared to a convolutional model to see if this can be improved.
# As the data was previously converted to the long format, this loses 2d spatial information. The data is therefore reloaded
train_data = torch.load('train_data_2')
test_data = torch.load('test_data_2')
train_data.shape, test_data.shape
# A 2-dimensional CNN is used as this typically performs best with 2D image data
class Conv_NN(nn.Module):
def __init__(self, n_channels):
super().__init__()
self.cnn_layers = nn.Sequential(
# Define a 2D convolution layer
nn.Conv2d(in_channels = n_channels, out_channels = 16, kernel_size=4, stride=1, padding=0),
nn.BatchNorm2d(16), # Batch normalisation standardises inputs to improve training performance
nn.ReLU(inplace=True), # Using the same activation function as with the Simple NN
nn.MaxPool2d(kernel_size=2, stride=2), # Max pooling to help extract sharp features e.g. edges
# Define another 2D convolution layer
nn.Conv2d(16, 8, kernel_size=4, stride=1, padding=0),
nn.BatchNorm2d(8),
nn.ReLU(inplace=True),
nn.AvgPool2d(kernel_size=2, stride=2) # Average pooling to help extract smooth image features
)
# Add linear layers to get to 10 outputs
self.linear_layers = nn.Sequential(
nn.Flatten(), # Flatten the input
nn.Linear(200, 100),
nn.ReLU(inplace=True),
nn.Linear(100, 10)
)
# Define the forward function
def forward(self, x):
x = self.cnn_layers(x)
x = x.view(x.size(0), -1)
x = self.linear_layers(x)
return x
# Instantiate the model and check the steps
net = Conv_NN(n_channels = 3)
print(net)
# Push processing onto GPU if available
if torch.cuda.is_available():
device = torch.device("cuda:0")
print("running on the GPU")
else:
device = torch.device("cpu")
print("running on the CPU")
net.to(device)
# Define loss function and optimizer
loss_function = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.01)
epochs = 10
BATCH_SIZE = 25
for epoch in range (epochs):
for i in tqdm(range(0,len(train_data), BATCH_SIZE)): # Use tqdm to show progress bars
# Batch the data
batch_data = train_data[i:i+BATCH_SIZE]
batch_labels = train_labels[i:i+BATCH_SIZE]
batch_data = batch_data.to(device)
batch_labels = batch_labels.to(device)
net.zero_grad() # Set gradients to 0 before loss calculation
output = net(batch_data) # Pass in the batch
loss = loss_function(output, batch_labels) # Calculate the loss value
loss.backward() # Apply the loss backwards through the network's parameters
optimizer.step() # Optimize weights to account for loss/gradients
print(loss)
# Use the model to predict from the test set
with torch.no_grad():
predicted = net.forward(test_data.to(device))
predicted_classes = torch.argmax(predicted, dim = 1)
# The model seems to predict well with fewer mistakes
print(predicted_classes[0:10])
print(test_labels[0:10])
# Push data back onto CPU for further analysis
device = torch.device('cpu')
predicted_classes = predicted_classes.to(device)
test_labels = test_labels.to(device)
# Find accuracy metric for prediction on test set
correct = 0
total = 0
for i in range(len(predicted_classes)):
if predicted_classes[i] == test_labels[i]:
correct += 1
total += 1
print("Accuracy: ", round(correct/total, 3))
from sklearn.metrics import confusion_matrix
confusion_matrix(test_labels, predicted_classes)
The convolutional model has given a good increase over the simple NN, now up to 64% accuracy. It appears that classes 2, 3, and 4 were more difficult to classify, showing misclassifications into several classes. Also, there are noticeable peaks in misclassification where class 8 and class 0 have been misclassified as each other, as were classes 5 and 3, and classes 9 and 1. This suggests there may be some common features between these class pairs that the model is identifying and causing the misclassifications observed.
Both models performed fairly well once optimized. One noticeable issue worth mentioning with the simple NN was the learning rate used, which was relatively high. Looking at the error values, after a certain number of epochs we saw that learning rates actually increased slightly, and the final model did not have the lowest error of all the epochs. This would suggest that the model was overadjusting in gradient descent and overshooting the minimum error of the model. However, the higher learning rate did bring the error down quickly in earlier epochs, so a stepped learning rate could be useful here, e.g. by using the StepLR or MultiStepLR function in the optimizer. Alternatively, it appears that a learning rate schedule that uses decay or momentum, e.g. the Adam optimizer, could significantly benefit this model in helping it to converge.
This strategy could also be applied to the CNN to hone in on a minimum for the error function. However, the CNN was significantly reducing error over each epoch, so a first strategy to improving the CNN model may simply be to run more epochs, with the possible addition of a more complex learning rate schedule.
1) https://www.freecodecamp.org/news/an-intuitive-guide-to-convolutional-neural-networks-260c2de0a050/
2) https://www.analyticsvidhya.com/blog/2019/10/building-image-classification-models-cnn-pytorch/
3) https://pytorch.org/docs/stable/generated/torch.nn.Module.html
4) https://www.geeksforgeeks.org/adjusting-learning-rate-of-a-neural-network-in-pytorch/