PyTorch Course

From bibbleWiki
Jump to navigation Jump to search

Introduction

This is a brief look into PyTorch from here. It is the most popular deep learning research framework

Vectors

General

So I just love terms, never remember them and spend my life watching YouTube to find out oh they meant that. Well for vectors they first went on about point on an x,y co-ordinate system where can point can be represented by an x,y points. Where x is along the horizon and y is not. For vectors we can calculate how many steps we go

  • left or right
  • up or down

We express a vector in square brackets and the
a=[53]
Like point they direction can be negative meaning right to left if the
b=[42]
The above is a two dimensional vector we can of course have three dimension with the z direction
v=[xyz]

Scaler Operation

Scaler operation is when you either multiply or divide a vector.

Scalar Multiplication

Multiplying a vector by a scalar stretches or shrinks its length.

a=[53],k=2

ka=2[53]=[106]

Scalar Division

Dividing a vector by a scalar reduces its magnitude.

b=[42],k=2

bk=12[42]=[21]

Vector Addition

To add two vectors, simply add their components:

a=[53],b=[42]

a+b=[5+(4)3+2]=[15]

Vector Length (Magnitude)

The magnitude of a vector v=[xy] is:

v=x2+y2

For example, if a=[53], then:

a=52+32=25+9=34

Basis Vectors

In 2D space, the standard basis vectors are:

i=[10],j=[01]

Any vector v can be written as a combination of these:

v=xi+yj

For example, if v=[32], then:

v=3i+2j

X-Component

When the vector is like this the x-component = the x value
v=[xy]
But when we have (which I do not understand)
v=v[cosθsinθ]
It is not

Y-Component

This is the same by with y

Creating Tensors

Here is a list of common types in pyTorch

Type What is it Dimensions PyTorch Representation Typical Casing
Scalar A single value with magnitude only 0D torch.tensor(7) Lowercase (e.g. a)
Vector Ordered components with direction 1D torch.tensor([5, 3]) Lowercase (e.g. y)
Matrix 2D array for transformations 2D torch.tensor([[1, 2], [3, 4]]) Uppercase (e.g. Q)
Tensor Generalized nD array 3D+ torch.randn(2, 3, 4) Uppercase (e.g. X)

Getting Started

Most of the video for the second section centered on following these steps

  • Import stuff e.g. pytorch, matplotlib
  • Create a dataset
  • Split data between testing and training data
  • Build a model by subclassing off nn.Module
  • Create loss function (something that measures between input data and expected)
  • Create an optimizer function (something that changes the parameters to improve the outcome)
  • Create a training loop
  • Create a testing loop
  • Make predication from the test data
  • Plot training vs test

Using Scripts from GitHub

Liked this bit as it is just plain useful and here so I do not forget

import requests
from pathlib import Path 

# Download helper functions from Learn PyTorch repo (if not already downloaded)
if Path("helper_functions.py").is_file():
  print("helper_functions.py already exists, skipping download")
else:
  print("Downloading helper_functions.py")
  # Note: you need the "raw" GitHub URL for this to work
  request = requests.get("https://raw.githubusercontent.com/mrdbourke/pytorch-deep-learning/main/helper_functions.py")
  with open("helper_functions.py", "wb") as f:
    f.write(request.content)

Training Loop

Here is the main approach to train loop in PyTorch

Testing Loop

Binary and Mult-Classifications

The workflow are the same but there were differences and this table probably sums them up prettry nicely

Classifications Metrics

Again a really useful slide for Daniel Bourke. He has been a joy to watch and with a 24 hours course that is saying something.

Summary of Classification and Workflow

So having got through the first 14 hours I guess the things that I took away way.

Lining up the Input/Output Data Shape

In the example we used the blob type from the sklearn library. Using this will produce 3 different sets of data points

import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
X1, y1 = make_blobs(n_samples=10, centers=3, n_features=2, random_state=0)

print(X1.shape)
print(X1)
print(y1)

Which gives

(10, 2)
[[ 1.12031365  5.75806083]
 [ 1.7373078   4.42546234]
 [ 2.36833522  0.04356792]
 [ 0.87305123  4.71438583]
 [-0.66246781  2.17571724]
 [ 0.74285061  1.46351659]
 [-4.07989383  3.57150086]
 [ 3.54934659  0.6925054 ]
 [ 2.49913075  1.23133799]
 [ 1.9263585   4.15243012]]
[0 0 1 0 2 2 2 1 1 0]

or

We can see in this instance each input type is a point so 2 numbers per sample and the output will be a single integer when determining which classification it belongs to. So when we build the model we will have

from torch import nn

NUM_CLASSES = 3
NUM_FEATURES = 2
RANDOM_SEED = 42

# Build model
class BlobModel(nn.Module):
    def __init__(self, input_features, output_features, hidden_units=8):
        """Initializes all required hyperparameters for a multi-class classification model.

        Args:
            input_features (int): Number of input features to the model.
            out_features (int): Number of output features of the model
              (how many classes there are).
            hidden_units (int): Number of hidden units between layers, default 8.
        """
        super().__init__()
        self.linear_layer_stack = nn.Sequential(
            nn.Linear(in_features=input_features, out_features=hidden_units),
            # nn.ReLU(), # <- does our dataset require non-linear layers? (try uncommenting and see if the results change)
            nn.Linear(in_features=hidden_units, out_features=hidden_units),
            # nn.ReLU(), # <- does our dataset require non-linear layers? (try uncommenting and see if the results change)
            nn.Linear(in_features=hidden_units, out_features=output_features), # how many classes are there?
        )

    def forward(self, x):
        return self.linear_layer_stack(x)

# Create an instance of BlobModel and send it to the target device
model_4 = BlobModel(input_features=NUM_FEATURES,
                    output_features=NUM_CLASSES,
                    hidden_units=8).to(device)
model_4

Also note that the output of layer is the input of the next layer and they must match

Ways to Improve Results

  • Add more layers - more chances to learn patterns
  • Add more hidden units - more features
  • Fit/Run for longer - more epochs
  • Change activation function - for binary usually sigmoid
  • Change leaning rate i.e. the lr in our optimizer

Classification Conversion of results

There was a bit of time showing conversion of logits to prediction labels which went along these lines

 logits -> prediction probabilities -> prediction labels

For sigmond (binary classification)

 test_pred = torch.round(torch.sigmoid(test_logits))

For softmax (mult-classification)

test_pred = torch.softmax(test_logits, dim=1).argmax(dim=1)

Visualization

Really like the amount of plotting (no pun). For me, this makes it far easier to understand

# 3. Split into train and test sets
X_blob_train, X_blob_test, y_blob_train, y_blob_test = train_test_split(X_blob,
    y_blob,
    test_size=0.2,
    random_state=RANDOM_SEED
)

# 4. Plot data
plt.figure(figsize=(10, 7))
plt.scatter(X_blob[:, 0], X_blob[:, 1], c=y_blob, cmap=plt.cm.RdYlBu);

Computer Vision

Getting Data

The getting of data was touch upon and that you need to examine the shape of the data. The image data format can change to set to set. E.g. Grayscale, Channel may come before width and height of after. There are helpers in well known datasets in Pytorch

# Setup training data
train_data = datasets.FashionMNIST(
    root="data", # where to download data to?
    train=True, # get training data
    download=True, # download data if it doesn't exist on disk
    transform=ToTensor(), # images come as PIL format, we want to turn into Torch tensors
    target_transform=None # you can transform labels as well
)

# Setup testing data
test_data = datasets.FashionMNIST(
    root="data",
    train=False, # get test data
    download=True,
    transform=ToTensor()
)

Mini Batching

The Data Loader is capable of dividing the data into batches for processing. Another important part of this is to shuffle the training data to avoid data which made be loaded in a particular order. This changes the training and testing loops slightly. The loop is before, not for every image, but for every batch.

Formatting of Data

The concept on flattening was introduced. Which is where the pixel data was combined from width x height to one value.

Convolutional Neural Network

Introduction

This was all relatively simple but now we are on CNN - not the news channel. Fortunately people put a lot of effort in to make this easier for idiots like me. I particular like https://poloclub.github.io/cnn-explainer/?ref=mrdbourke.com

Building a model

I need visuals to get going and this course has helped me tremendously. Before I provide Daniels code here is the cnn explained screen shot

Now the code.

# Create a convolutional neural network 
class FashionMNISTModelV2(nn.Module):
    """
    Model architecture copying TinyVGG from: 
    https://poloclub.github.io/cnn-explainer/
    """
    def __init__(self, input_shape: int, hidden_units: int, output_shape: int):
        super().__init__()
        self.block_1 = nn.Sequential(
            nn.Conv2d(in_channels=input_shape, 
                      out_channels=hidden_units, 
                      kernel_size=3, # how big is the square that's going over the image?
                      stride=1, # default
                      padding=1),# options = "valid" (no padding) or "same" (output has same shape as input) or int for specific number 
            nn.ReLU(),
            nn.Conv2d(in_channels=hidden_units, 
                      out_channels=hidden_units,
                      kernel_size=3,
                      stride=1,
                      padding=1),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2,
                         stride=2) # default stride value is same as kernel_size
        )
        self.block_2 = nn.Sequential(
            nn.Conv2d(hidden_units, hidden_units, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(hidden_units, hidden_units, 3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2)
        )
        self.classifier = nn.Sequential(
            nn.Flatten(),
            # Where did this in_features shape come from? 
            # It's because each layer of our network compresses and changes the shape of our input data.
            nn.Linear(in_features=hidden_units*7*7, 
                      out_features=output_shape)
        )
    
    def forward(self, x: torch.Tensor):
        x = self.block_1(x)
        # print(x.shape)
        x = self.block_2(x)
        # print(x.shape)
        x = self.classifier(x)
        # print(x.shape)
        return x

torch.manual_seed(42)
model_2 = FashionMNISTModelV2(input_shape=1, 
    hidden_units=10, 
    output_shape=len(class_names)).to(device)
model_2

You can see the model has 2 blocks just like the picture. Here is the first block on the picture

 conv_1_1
 relu_1_1 
 conv_1_2
 relu_1_2
 max_pool_1

The last layer is know as the classifier layer. For the input parameters we are slightly different to the picture a we are using grayscale and they are using color so our input shape is 1 rather than 3 which you can see on the top right of the picture 64x64 with 3 channels. The 10 hidden units is like the picture because if you count downwards there a 10 parts to each layers processing. Finally the output shape is the number of classifier labels there are.

So a bit about the hyperparameters and 2D Convolutional Operations

  • Kernel Size - The size of the kernel derrr
  • Stride - The number of spaces the kernel moves at a time
  • Padding - I am assuming that this is space to put around the kernel to ensure all of the data is processed

I like how you can see how the result is calculated. I looked up a Sobel Kernel and it said Edge detection—especially highlighting intensity changes along horizontal or vertical directions. However for Conv2d parameters for the kernel are determined by the CNN during processing.

For the max_pool layer, which has a 2x2 kernel, the input is reduced by a quarter my taking the highest value of each 4 pixel set.

To lastly, when building the classifier layer, Daniel stressed it is not easy to know the in_features shape.

        self.classifier = nn.Sequential(
            nn.Flatten(),
            # Where did this in_features shape come from? 
            # It's because each layer of our network compresses and changes the shape of our input data.
            nn.Linear(in_features=hidden_units*7*7, 
                      out_features=output_shape)
        )

To help calculate this Daniel put an image through the layers separately and printed the shapes. In this case the final output from block_2 was 1,10,7,7. In this case when the in_features=hidden_units (no 7*7) the error is

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x490 and 10x10). 

The first number is their shape, the second number is ours. So 10x10 comes from the hidden units=10 and the 10 from the output of block_2. We see that the output is flattened. Which means 10,7,7 is combined i.e. 10x7x7 = 490. So working back before flattening the we have 10x7x7

Functioning Training, Test and Running

We eventually put these loops into reusable functions. This will be different based on you needs but an example hear to remind me

Functioning Training Loop

def train_step(model: torch.nn.Module,
               data_loader: torch.utils.data.DataLoader,
               loss_fn: torch.nn.Module,
               optimizer: torch.optim.Optimizer,
               accuracy_fn,
               device: torch.device = device):
    train_loss, train_acc = 0, 0
    model.to(device)
    model.train()  # Enable training mode (e.g. dropout, batchnorm)

    for batch, (X, y) in enumerate(data_loader):
        X, y = X.to(device), y.to(device)

        # 1. Forward pass
        y_pred = model(X).squeeze()  # Shape: (batch_size,)

        # 2. Calculate loss
        loss = loss_fn(y_pred, y)
        train_loss += loss.item()

        # 3. Calculate accuracy
        pred_labels = (torch.sigmoid(y_pred) > 0.5).float()
        train_acc += accuracy_fn(
            y_true=y.view(-1).cpu().numpy(),
            y_pred=pred_labels.view(-1).cpu().numpy()
        )

        # 4. Backpropagation
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

    # 5. Average metrics
    train_loss /= len(data_loader)
    train_acc /= len(data_loader)
    print(f"Train loss: {train_loss:.5f} | Train accuracy: {train_acc:.2f}%")
    return train_loss, train_acc

Functioning Testing Loop

def test_step(data_loader: torch.utils.data.DataLoader,
              model: torch.nn.Module,
              loss_fn: torch.nn.Module,
              accuracy_fn,
              device: torch.device = device):
    test_loss, test_acc = 0, 0
    model.to(device)
    model.eval()  # Evaluation mode disables dropout, etc.

    with torch.inference_mode():  # No gradients needed
        for X, y in data_loader:
            X, y = X.to(device), y.to(device)

            # Forward pass
            test_pred = model(X).squeeze()  # Shape: (batch_size,)

            # Loss
            test_loss += loss_fn(test_pred.squeeze(), y).item()


            # Prediction → binary threshold
            pred_labels = (torch.sigmoid(test_pred) > 0.5).float()

            # Accuracy
            test_acc += accuracy_fn(
                y_true=y.view(-1).cpu().numpy(),
                y_pred=pred_labels.view(-1).cpu().numpy()
            )

    # Average metrics
    test_loss /= len(data_loader)
    test_acc /= len(data_loader)

    print(f"Test loss: {test_loss:.5f} | Test accuracy: {test_acc:.2f}%\n")
    return test_loss, test_acc

Running them Against the Model

torch.manual_seed(42)

# Measure time
from timeit import default_timer as timer
train_time_start_on_gpu = timer()

epochs = 3
for epoch in tqdm(range(epochs)):
    print(f"Epoch: {epoch}\n---------")
    train_step(data_loader=train_dataloader, 
        model=model_1, 
        loss_fn=loss_fn,
        optimizer=optimizer,
        accuracy_fn=accuracy_fn
    )
    test_step(data_loader=test_dataloader,
        model=model_1,
        loss_fn=loss_fn,
        accuracy_fn=accuracy_fn
    )

train_time_end_on_gpu = timer()
total_train_time_model_1 = print_train_time(start=train_time_start_on_gpu,
                                            end=train_time_end_on_gpu,
                                            device=device)

Installing Missing Packages in Colab

This is a quick not to show how to install missing packages

# See if torchmetrics exists, if not, install it
try:
    import torchmetrics, mlxtend
    print(f"mlxtend version: {mlxtend.__version__}")
    assert int(mlxtend.__version__.split(".")[1]) >= 19, "mlxtend verison should be 0.19.0 or higher"
except:
    !pip install -q torchmetrics -U mlxtend # <- Note: If you're using Google Colab, this may require restarting the runtime
    import torchmetrics, mlxtend
    print(f"mlxtend version: {mlxtend.__version__}")

More Functionalizing

So more code just for reference.

Getting Random Data

This time the code to get random images from test data

import random
random.seed(42)
test_samples = []
test_labels = []
for sample, label in random.sample(list(test_data), k=9):
    test_samples.append(sample)
    test_labels.append(label)

# View the first test sample shape and label
print(f"Test sample image shape: {test_samples[0].shape}\nTest sample label: {test_labels[0]} ({class_names[test_labels[0]]})")

Making Predictions

Again this will change with different requirements

def make_predictions(model: torch.nn.Module, data: list, device: torch.device = device):
    pred_probs = []
    model.eval()
    with torch.inference_mode():
        for sample in data:
            # Prepare sample
            sample = torch.unsqueeze(sample, dim=0).to(device) # Add an extra dimension and send sample to device

            # Forward pass (model outputs raw logit)
            pred_logit = model(sample)

            # Get prediction probability (logit -> prediction probability)
            pred_prob = torch.softmax(pred_logit.squeeze(), dim=0) # note: perform softmax on the "logits" dimension, not "batch" dimension (in this case we have a batch size of 1, so can perform on dim=0)

            # Get pred_prob off GPU for further calculations
            pred_probs.append(pred_prob.cpu())
            
    # Stack the pred_probs to turn list into a tensor
    return torch.stack(pred_probs)

Running Predictions

So here is how to use it

# Make predictions on test samples with model 2
pred_probs= make_predictions(model=model_2, 
                             data=test_samples)

# Turn the prediction probabilities into prediction labels by taking the argmax()
pred_classes = pred_probs.argmax(dim=1)
pred_classes

Plotting

Really the main reason to copy large amounts of code was for this function. I am loving the plotting library and may taking a shine to python more these days

# Plot predictions
plt.figure(figsize=(9, 9))
nrows = 3
ncols = 3
for i, sample in enumerate(test_samples):
  # Create a subplot
  plt.subplot(nrows, ncols, i+1)

  # Plot the target image
  plt.imshow(sample.squeeze(), cmap="gray")

  # Find the prediction label (in text form, e.g. "Sandal")
  pred_label = class_names[pred_classes[i]]

  # Get the truth label (in text form, e.g. "T-shirt")
  truth_label = class_names[test_labels[i]] 

  # Create the title text of the plot
  title_text = f"Pred: {pred_label} | Truth: {truth_label}"
  
  # Check for equality and change title colour accordingly
  if pred_label == truth_label:
      plt.title(title_text, fontsize=10, c="g") # green text if correct
  else:
      plt.title(title_text, fontsize=10, c="r") # red text if wrong
  plt.axis(False);

So this gives

Confusion Matrix

So for once this was not new to me. I have built these when looking at breast density (a long long time ago before my stroke). The above is great to see but this with color makes this rock.

To make it is really easy using torch metrics

from torchmetrics import ConfusionMatrix
from mlxtend.plotting import plot_confusion_matrix

# 2. Setup confusion matrix instance and compare predictions to targets
confmat = ConfusionMatrix(num_classes=len(class_names), task='multiclass')
confmat_tensor = confmat(preds=y_pred_tensor,
                         target=test_data.targets)

# 3. Plot the confusion matrix
fig, ax = plot_confusion_matrix(
    conf_mat=confmat_tensor.numpy(), # matplotlib likes working with NumPy 
    class_names=class_names, # turn the row and column labels into class names
    figsize=(10, 7)
);

This produces this which makes it so easy to see where the problems are

Custom Datasets

What Follows

Quite rightly the steps below focus on the data at each step, checking the code does what it says on the tin. A fair bit of time is devoted to doing something, then checking it did it

Dataset Standard

There are standards people keep to for storing datasets. You separate test and train and store data in a directory named the classification it belongs to.

There are 2 directories and 0 images in 'data/pizza_steak_sushi'.
There are 3 directories and 0 images in 'data/pizza_steak_sushi/train'.
There are 0 directories and 75 images in 'data/pizza_steak_sushi/train/steak'.
There are 0 directories and 78 images in 'data/pizza_steak_sushi/train/pizza'.
There are 0 directories and 72 images in 'data/pizza_steak_sushi/train/sushi'.
There are 3 directories and 0 images in 'data/pizza_steak_sushi/test'.
There are 0 directories and 19 images in 'data/pizza_steak_sushi/test/steak'.
There are 0 directories and 25 images in 'data/pizza_steak_sushi/test/pizza'.
There are 0 directories and 31 images in 'data/pizza_steak_sushi/test/sushi'.

Dataset Transform

I did not image it necessary to flip the image but this flips them 50% of the time. They are also resized, which the robot says is necessary.

# Write transform for image
data_transform = transforms.Compose([
    # Resize the images to 64x64
    transforms.Resize(size=(64, 64)),
    # Flip the images randomly on the horizontal
    transforms.RandomHorizontalFlip(p=0.5), # p = probability of flip, 0.5 = 50% chance
    # Turn the image into a torch.Tensor
    transforms.ToTensor() # this also converts all pixel values from 0 to 255 to be between 0.0 and 1.0 
])

I really did like the permute which allows you to shuffle the channels. Someone thought able the usage when they wrote that

def plot_transformed_images(image_paths, transform, n=3, seed=42):
    random.seed(seed)
    random_image_paths = random.sample(image_paths, k=n)
    for image_path in random_image_paths:
        with Image.open(image_path) as f:
            fig, ax = plt.subplots(1, 2)
            ax[0].imshow(f) 
            ax[0].set_title(f"Original \nSize: {f.size}")
            ax[0].axis("off")

            # Transform and plot image
            # Note: permute() will change shape of image to suit matplotlib 
            # (PyTorch default is [C, H, W] but Matplotlib is [H, W, C])
            transformed_image = transform(f).permute(1, 2, 0) 
            ax[1].imshow(transformed_image) 
            ax[1].set_title(f"Transformed \nSize: {transformed_image.shape}")
            ax[1].axis("off")

            fig.suptitle(f"Class: {image_path.parent.stem}", fontsize=16)

Using ImageFolder

This has functionality to load you images in and transform the data and the labels. There are lots of these to use but here is the example in the course

# Use ImageFolder to create dataset(s)
from torchvision import datasets
train_data = datasets.ImageFolder(root=train_dir, # target folder of images
                                  transform=data_transform, # transforms to perform on data (images)
                                  target_transform=None) # transforms to perform on labels (if necessary)

test_data = datasets.ImageFolder(root=test_dir, 
                                 transform=data_transform)

Writing Your Own ImageFolder

PyTorch provides a base class called Dataset. You need to implement _getitem()__. Lot of code but perhaps handy for any future use outside of the course.

import os
import pathlib
import torch

from PIL import Image
from torch.utils.data import Dataset
from torchvision import transforms
from typing import Tuple, Dict, List

# Write a custom dataset class (inherits from torch.utils.data.Dataset)
from torch.utils.data import Dataset

# 1. Subclass torch.utils.data.Dataset
class ImageFolderCustom(Dataset):
    
    # 2. Initialize with a targ_dir and transform (optional) parameter
    def __init__(self, targ_dir: str, transform=None) -> None:
        
        # 3. Create class attributes
        # Get all image paths
        self.paths = list(pathlib.Path(targ_dir).glob("*/*.jpg")) # note: you'd have to update this if you've got .png's or .jpeg's
        # Setup transforms
        self.transform = transform
        # Create classes and class_to_idx attributes
        self.classes, self.class_to_idx = self._find_classes(targ_dir)

    # 4. Make function to load images
    def load_image(self, index: int) -> Image.Image:
        "Opens an image via a path and returns it."
        image_path = self.paths[index]
        return Image.open(image_path) 
    
    # 5. Overwrite the __len__() method (optional but recommended for subclasses of torch.utils.data.Dataset)
    def __len__(self) -> int:
        "Returns the total number of samples."
        return len(self.paths)
    
    # 6. Overwrite the __getitem__() method (required for subclasses of torch.utils.data.Dataset)
    def __getitem__(self, index: int) -> Tuple[torch.Tensor, int]:
        "Returns one sample of data, data and label (X, y)."
        img = self.load_image(index)
        class_name  = self.paths[index].parent.name # expects path in data_folder/class_name/image.jpeg
        class_idx = self.class_to_idx[class_name]

        # Transform if necessary
        if self.transform:
            return self.transform(img), class_idx # return data, label (X, y)
        else:
            return img, class_idx # return data, label (X, y)

    # Make function to find classes in target directory
    def _find_classes(self, directory: str) -> Tuple[List[str], Dict[str, int]]:
        """Finds the class folder names in a target directory.
    
        Assumes target directory is in standard image classification format.

        Args:
            directory (str): target directory to load classnames from.

        Returns:
            Tuple[List[str], Dict[str, int]]: (list_of_class_names, dict(class_name: idx...))
    
        Example:
            find_classes("food_images/train")
            >>> (["class_1", "class_2"], {"class_1": 0, ...})
        """
        # 1. Get the class names by scanning the target directory
        classes = sorted(entry.name for entry in os.scandir(directory) if entry.is_dir())
    
        # 2. Raise an error if class names not found
        if not classes:
            raise FileNotFoundError(f"Couldn't find any classes in {directory}.")
        
        # 3. Create a dictionary of index labels (computers prefer numerical rather than string labels)
        class_to_idx = {cls_name: i for i, cls_name in enumerate(classes)}
        return classes, class_to_idx

Data Augmentation

We can manipulate the data by rotating, zooming, flipping etc. This is done to allow the model to see the same data from different perspectives. This was shown to improve results considerably.

from torchvision import transforms

train_transforms = transforms.Compose([
    transforms.Resize((224, 224)),
    transforms.TrivialAugmentWide(num_magnitude_bins=31), # how intense 
    transforms.ToTensor() # use ToTensor() last to get everything between 0 & 1
])

torchinfo

Useful tool for a mere beginner like me. Here showing a TinyVGG

Plotting the Loss

I would imagine, aside from loading data accurately, knowing what good looks like is pretty, pretty key. So here is how Daniel plotted the loss. First the shape of the data

Epoch: 1 | train_loss: 1.1078 | train_acc: 0.2578 | test_loss: 1.1360 | test_acc: 0.2604
Epoch: 2 | train_loss: 1.0847 | train_acc: 0.4258 | test_loss: 1.1620 | test_acc: 0.1979
Epoch: 3 | train_loss: 1.1157 | train_acc: 0.2930 | test_loss: 1.1697 | test_acc: 0.1979 
Epoch: 4 | train_loss: 1.0956 | train_acc: 0.4141 | test_loss: 1.1384 | test_acc: 0.1979
Epoch: 5 | train_loss: 1.0985 | train_acc: 0.2930 | test_loss: 1.1426 | test_acc: 0.1979

Now the code to plot it.

def plot_loss_curves(results: Dict[str, List[float]]):
    """Plots training curves of a results dictionary.

    Args:
        results (dict): dictionary containing list of values, e.g.
            {"train_loss": [...],
             "train_acc": [...],
             "test_loss": [...],
             "test_acc": [...]}
    """
    
    # Get the loss values of the results dictionary (training and test)
    loss = results['train_loss']
    test_loss = results['test_loss']

    # Get the accuracy values of the results dictionary (training and test)
    accuracy = results['train_acc']
    test_accuracy = results['test_acc']

    # Figure out how many epochs there were
    epochs = range(len(results['train_loss']))

    # Setup a plot 
    plt.figure(figsize=(15, 7))

    # Plot loss
    plt.subplot(1, 2, 1)
    plt.plot(epochs, loss, label='train_loss')
    plt.plot(epochs, test_loss, label='test_loss')
    plt.title('Loss')
    plt.xlabel('Epochs')
    plt.legend()

    # Plot accuracy
    plt.subplot(1, 2, 2)
    plt.plot(epochs, accuracy, label='train_accuracy')
    plt.plot(epochs, test_accuracy, label='test_accuracy')
    plt.title('Accuracy')
    plt.xlabel('Epochs')
    plt.legend();

This produced

Understanding Loss Curves in Machine Learning

Here are three examples

Loss curves help visualize how well a model is learning over time. They typically show two lines:

  • Training loss – how well the model fits the data it was trained on.
  • Test loss – how well the model performs on unseen data.

Below are three common patterns:

Underfitting

  • Description: Both training and test loss stay high and close together.
  • What it means: The model isn’t learning enough from the data. It’s too simple or not trained long enough.
  • Visual cue: Flat or slowly decreasing lines with little separation.
  • Fixes: Try a more complex model, train longer, or improve feature quality.

Overfitting

  • Description: Training loss drops sharply, but test loss stays high or increases.
  • What it means: The model is memorizing the training data but failing to generalize.
  • Visual cue: Wide gap between training and test loss.
  • Fixes: Use regularization, simplify the model, or add more training data.

Just Right (Well-Fitted)

  • Description: Both training and test loss decrease and stay close together.
  • What it means: The model is learning effectively and generalizing well.
  • Visual cue: Parallel curves with low loss and minimal gap.
  • Goal: This is the ideal balance between bias and variance.

Fixing Underfitting in PyTorch

Underfitting occurs when a model fails to learn the underlying patterns in the data. It typically results in high training and test loss.

Cause Symptom Remedy
Model too simple High training and test loss Use a deeper or wider neural network
Insufficient training Loss plateaus early Train for more epochs or use better learning rate scheduling
Poor feature representation Low accuracy across datasets Improve input features or use pretrained embeddings
Over-regularization Loss remains high despite training Reduce dropout, weight decay, or other regularization
Low learning rate Slow convergence Increase learning rate or use adaptive optimizers like Adam

Fixing Overfitting in PyTorch

Overfitting happens when a model learns the training data too well but fails to generalize to new data. It shows low training loss and high test loss.

Cause Symptom Remedy
Model too complex Training loss much lower than test loss Simplify the architecture or reduce parameters
Too few training samples Poor generalization Augment data or gather more samples
No regularization Sharp drop in training loss Add dropout, weight decay, or early stopping
High learning rate Erratic loss behavior Lower learning rate or use learning rate decay
Overtraining Test loss increases after a point Use early stopping or monitor validation metrics

Summary

  • Underfitting = model not learning enough → make it smarter or train longer.
  • Overfitting = model learning too much from training → make it generalize better.

Using Your Own Custom Image

The problems you might get when using a custom image is.

  • Wrong datatypes - our model expects `torch.float32` where our original custom image was `uint8`.
  • Wrong device - our model was on the target `device` (in our case, the GPU) whereas our target data hadn't been moved to the target `device` yet.
  • Wrong shapes - our model expected an input image of shape `[N, C, H, W]` or `[batch_size, color_channels, height, width]` whereas our custom image tensor was of shape `[color_channels, height, width]`.

Here is the all in one function to predict an image. Should make a good base for future pyTorch work.

def pred_and_plot_image(model: torch.nn.Module, 
                        image_path: str, 
                        class_names: List[str] = None, 
                        transform=None,
                        device: torch.device = device):
    """Makes a prediction on a target image and plots the image with its prediction."""
    
    # 1. Load in image and convert the tensor values to float32
    target_image = torchvision.io.read_image(str(image_path)).type(torch.float32)
    
    # 2. Divide the image pixel values by 255 to get them between [0, 1]
    target_image = target_image / 255. 
    
    # 3. Transform if necessary
    if transform:
        target_image = transform(target_image)
    
    # 4. Make sure the model is on the target device
    model.to(device)
    
    # 5. Turn on model evaluation mode and inference mode
    model.eval()
    with torch.inference_mode():
        # Add an extra dimension to the image
        target_image = target_image.unsqueeze(dim=0)
    
        # Make a prediction on image with an extra dimension and send it to the target device
        target_image_pred = model(target_image.to(device))
        
    # 6. Convert logits -> prediction probabilities (using torch.softmax() for multi-class classification)
    target_image_pred_probs = torch.softmax(target_image_pred, dim=1)

    # 7. Convert prediction probabilities -> prediction labels
    target_image_pred_label = torch.argmax(target_image_pred_probs, dim=1)
    
    # 8. Plot the image alongside the prediction and prediction probability
    plt.imshow(target_image.squeeze().permute(1, 2, 0)) # make sure it's the right size for matplotlib
    if class_names:
        title = f"Pred: {class_names[target_image_pred_label.cpu()]} | Prob: {target_image_pred_probs.max().cpu():.3f}"
    else: 
        title = f"Pred: {target_image_pred_label} | Prob: {target_image_pred_probs.max().cpu():.3f}"
    plt.title(title)
    plt.axis(False);

Hidden Markov Model

Never came up in the PyTorch but did in the Tensorflow so want to provide a PyTorch HMM equivalent.

import torch
import torch.distributions as dist

# Initial state distribution
initial_probs = torch.tensor([0.2, 0.8])
initial_dist = dist.Categorical(probs=initial_probs)

# Transition matrix
transition_probs = torch.tensor([[0.5, 0.5],
                                 [0.2, 0.8]])

# Emission distributions
observation_dists = [
    dist.Normal(loc=0.0, scale=5.0),
    dist.Normal(loc=15.0, scale=10.0)
]

# Prediction loop
num_steps = 7
states = []
predictions = []

# Sample initial state and emission
state = initial_dist.sample()
states.append(state.item())
predictions.append(observation_dists[state].sample().item())

# Sample transitions and emissions
for _ in range(1, num_steps):
    state = dist.Categorical(probs=transition_probs[state]).sample()
    states.append(state.item())
    predictions.append(observation_dists[state].sample().item())

# Output
print("Hidden States:", states)
print("Predicted Observations:", predictions)