Not Hot Dog

Reading time: 10 minutes

This post introduces transfer learning, which is the use of a pre-trained model instead of training a model from scratch. In particular, we'll be training an object recogniser using ResNet, a Convolutional Neural Network pre-trained on the ImageNet database. ImageNet is a hand-annotated image dataset consisting of 14 million images in 20,000 categories. Drawing inspiration from HBO's Silicon Valley, our model will recognise whether an image is "Hot Dog" or "Not Hot Dog".

In this post, you'll learn how to:

  • load a pre-trained model;
  • freeze earlier layers; and
  • change later layers to obtain the output we want.

By the end, you'll learn to predict whether these images are "Hot Dog" or "Not Hot Dog":

Import Packages

As in previous posts, we'll be using the PyTorch package from Facebook.

In [1]:
from torchvision import datasets, transforms, models
from torchvision.utils import make_grid

from torch.utils.data.dataset import random_split
from torch.utils.data import DataLoader

from torch import nn
from torch import optim

import torch
import torch.nn.functional as F

Transfer Learning

Loading Data

A good source of images is Google Image Search. If you are interested in "Hot Dog" photos, search using the term in Google Image Search and once you have the images displayed on the results page, open up the JavaScript console in the browser (they keyboard shortcut is usually F12). Lesson 2 of Fast.ai's Practical Deep Learning for Coders provides the following JavaScript snippet which can be pasted into the console in order to download a text file consisting of all the URLs of the images on the page.

urls = Array.from(document.querySelectorAll('.rg_di .rg_meta')).map(el=>JSON.parse(el.textContent).ou);
window.open('data:text/csv;charset=utf-8,' + escape(urls.join('\n')));

When the browser asks you to save the file, use the name urls.txt. Then, use wget to download all the images in that file:

$ wget --wait=2 --random-wait --input urls.txt

In our case, however, Dan Becker, a Data Scientist at Kaggle, has already created a "Hot Dog" / "Not Hot Dog" dataset and has even divided the photos into train and test folders.

We load the train and test folders into dataloaders, applying various transforms to the dataset:

In [4]:
train_transform = transforms.Compose([
    transforms.RandomRotation(30),
    transforms.RandomResizedCrop(224),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])

test_transform = transforms.Compose([
    transforms.Resize(255),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])

train_dataset = datasets.ImageFolder('data/hotdog/train', transform=train_transform)
test_dataset = datasets.ImageFolder('data/hotdog/test', transform=test_transform)

train_dataloader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_dataset, batch_size=64, shuffle=True)

Choosing between CPU and GPU:

In [5]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

Transfer learning is the transferring of knowledge learnt in one problem to another different but related problem. In our case, we will use a pre-trained ResNet 18 model as a feature extractor. ResNet 18 is a Convolutional Neural Network designed to categories objects into 1,000 categories such as "accordion", "barber chair", "croquet ball", etc. The pre-trained model we use has been trained on the ImageNet database, a hand-annotated image dataset consisting of 14 million images in 20,000 categories. As you move through the layers of a neural network, the features it learns becomes more complicated. In the later layers, it pieces these features together to learn about specific objects.

Shown on the left are the kernels, and on the right are sets of sample matching images.

Out[3]:

To keep the knowledge in the pre-trained ResNet 18 model, we will freeze the weights in all the layers except the final layer. Then, for the final layer, instead of a fully-connected layer that connects 512 neurons (in the penultimate layer) to 1,000 object categories (making a total of 512,000 connections), we are going to use a fully-connected layer that only outputs to 2 object categories (with 1,024 connections), representing "Hot Dog" and "Not Hot Dog".

In [6]:
net = models.resnet18(pretrained=True)

# freeze parameters in all layers
for param in net.parameters():
    param.requires_grad = False

# switch the final layer (named "fc") to a fully-connected with 2 outputs
# note: parameters of newly created modules have unfrozen parameters by default
num_features = net.fc.in_features
net.fc = nn.Linear(num_features, 2)

net = net.to(device)

Setting the criterion and optimizer:

In [7]:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

The usual training phase:

In [ ]:
net.train()  # set network to training phase
    
epochs = 25

# for each pass of the training dataset
for epoch in range(epochs):
    train_loss, train_correct, train_total = 0, 0, 0
    
    # for each batch of training examples
    for batch_index, (inputs, labels) in enumerate(train_dataloader):
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()  # zero the parameter gradients
        outputs = net(inputs)  # forward pass
        loss = criterion(outputs, labels)  # compare output with ground truth
        loss.backward()  # backpropagation
        optimizer.step()  # update network weights

        # record statistics
        _, preds = torch.max(outputs.data, 1)
        train_loss += loss.item()
        train_correct += (preds == labels).sum().item()
        train_total += len(labels)
        
        # print statistics every 100 batches
        if (batch_index + 1) % 1 == 0:
            print(f'Epoch {epoch + 1}, ' +
                  f'Batch {batch_index + 1}, ' +
                  f'Train Loss: {(train_loss/1):.5f}, ' +
                  f'Train Accuracy: {(train_correct/train_total):.5f}')
            
            train_loss, train_correct, train_total = 0, 0, 0

Similarly, for the usual evaluation phase:

In [35]:
net.eval()  # set network to evaluation phase

test_loss = 0
test_correct = 0
test_total = len(test_dataloader.dataset)

with torch.no_grad():  # detach gradients so network runs faster
    
    # for each batch of testing examples
    for batch_index, (inputs, labels) in enumerate(test_dataloader):
        inputs, labels = inputs.to(device), labels.to(device)
        outputs = net(inputs)  # forward pass
        
        # record loss
        loss = criterion(outputs, labels)
        test_loss += loss.item()
        
        # select largest output as prediction
        _, preds = torch.max(outputs.data, 1)
        
         # compare prediction with ground truth and mark as correct if equal
        test_correct += (preds == labels).sum().item()

print(f'Test Loss: {(test_loss/len(test_dataloader)):.5f}, ' +
      f'Test Accuracy: {(test_correct/test_total):.5f} ' +
      f'({test_correct}/{test_total})')
Test Loss: 0.29229, Test Accuracy: 0.87200 (436/500)

We see that we achieved 87.2% accuracy on the test set.

Now, let's predict the images mentioned at the start:

In [196]:
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])

test_dataloader_iterator = iter(test_dataloader)
inputs, _ = test_dataloader_iterator.next()

images = make_grid(inputs[:6], nrow=2)
images = images.numpy().transpose((1, 2, 0))
images = np.clip(std * images + mean, 0, 1)
plt.figure(figsize = (8, 12))
plt.axis('off')
plt.imshow(images)

To generate the predictions, we:

  1. move the inputs to the GPU where the Neural Network is residing;
  2. run the inputs through the network to get outputs;
  3. select the output with the largest number of each image as the prediction; and
  4. map the predicted label to a human-readable label.
In [202]:
inputs = inputs.to(device)
outputs = net(inputs)

# select largest output as prediction
_, preds = torch.max(outputs.data, 1)

labels_dict = {0: 'Hot Dog', 1: 'Not Hot Dog'}
pred_labels = [labels_dict[pred] for pred in preds[:6].cpu().numpy()]

for i, pred_label in enumerate(pred_labels, 1):
    print(f'{i}. {pred_label}')
1. Hot Dog
2. Not Hot Dog
3. Not Hot Dog
4. Not Hot Dog
5. Hot Dog
6. Hot Dog

Bonus: Deploying the Machine Learning Model using Flask

Saving/Loading the Neural Network

There are two ways to save/load the Neural Network:

In [9]:
# save
torch.save(net, PATH)

# load
net = torch.load(PATH)
net.eval()

and

In [ ]:
# save
torch.save(net.state_dict(), PATH)

# load
net = Net()
net.load_state_dict(torch.load(PATH))
net.eval()

The first way is simple and succinct and will save the entire module. However, the data is bound to the specific classes and the exact directory structure, so if any of that changes, the code will break.

The second way is verbose, but more flexible. It saves the state_dict which is a Python dictionary mapping of each layer to its parameters.

Running Predictions using Flask

You can deploy a model using Flask in order to make your prediction model available to the wider world without needing them to install PyTorch and train their own model. We will build a microservice that accepts the image URL in a GET request and return whether that image is "Hot Dog" or "Not Hot Dog". More precisely, we'll be able to execute the following cURL requests to retrieve a prediction:

$ curl http://127.0.0.1:5000/?url=https://actamachina.com/assets/images/06-hot-dog.jpg
"Hot Dog"
$ curl http://127.0.0.1:5000/?url=https://actamachina.com/assets/images/06-not-hot-dog.jpg
"Not Hot Dog"

Contents of predict.py:

In [11]:
from flask import Flask, request, jsonify
from io import BytesIO
from PIL import Image
from torchvision import transforms

import requests
import torch


app = Flask(__name__)

test_transform = transforms.Compose([
    transforms.Resize(255),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406],
                         [0.229, 0.224, 0.225]),
])


@app.route('/')
def predict():
    # Flask: open the image URL as a raw image 
    url = request.args['url']
    response = requests.get(url)
    raw_img = Image.open(BytesIO(response.content))
    
    # PyTorch: pass the raw img to the network and get predicted label
    img = (test_transform(raw_img)
           .cuda()
           .view(1,3,224,224))

    net = torch.load('hotdog.pth')
    net.eval()
    
    outputs = net(img)
    _, preds = torch.max(outputs.data, 1)
    pred = preds.cpu().numpy()[0]
    
    return jsonify('Hot Dog' if pred == 0 else 'Not Hot Dog')

Most of the code in the PyTorch section should look familiar to you. The Flask section retrieves the image URL from the GET request and fetches the raw image.

To run Flask, and start listening for requests at http://127.0.0.1:5000, run:

$ export FLASK_APP=predict.py
$ export FLASK_ENV=development
$ flask run

When you have tested your Flask code, change the FLASK_ENV to production when you are ready to deploy to production. This turns off the debugger and reloader by default. The Flask documentation walks you through deployment options such as Google App Engine, Heroku or self-hosted options.

Summary and Next Steps

In this post, you've learnt how to:

  • load a pre-trained model;
  • freeze earlier layers; and
  • change later layers to obtain the output we want.

In the next post, we'll be looking at image style transfer, i.e. how to take an image and transform it into a specific artistic style.