Not Hot Dog
Object recognition using transfer learning.
Reading time: 10 minutes
This post introduces transfer learning, which is the use of a pre-trained model instead of training a model from scratch. In particular, we'll be training an object recogniser using ResNet, a Convolutional Neural Network pre-trained on the ImageNet database. ImageNet is a hand-annotated image dataset consisting of 14 million images in 20,000 categories. Drawing inspiration from HBO's Silicon Valley, our model will recognise whether an image is "Hot Dog" or "Not Hot Dog".
In this post, you'll learn how to:
- load a pre-trained model;
- freeze earlier layers; and
- change later layers to obtain the output we want.
By the end, you'll learn to predict whether these images are "Hot Dog" or "Not Hot Dog":
Import Packages¶
As in previous posts, we'll be using the PyTorch package from Facebook.
from torchvision import datasets, transforms, models
from torchvision.utils import make_grid
from torch.utils.data.dataset import random_split
from torch.utils.data import DataLoader
from torch import nn
from torch import optim
import torch
import torch.nn.functional as F
Transfer Learning¶
Loading Data¶
A good source of images is Google Image Search. If you are interested in "Hot Dog" photos, search using the term in Google Image Search and once you have the images displayed on the results page, open up the JavaScript console in the browser (they keyboard shortcut is usually F12). Lesson 2 of Fast.ai's Practical Deep Learning for Coders provides the following JavaScript snippet which can be pasted into the console in order to download a text file consisting of all the URLs of the images on the page.
urls = Array.from(document.querySelectorAll('.rg_di .rg_meta')).map(el=>JSON.parse(el.textContent).ou);
window.open('data:text/csv;charset=utf-8,' + escape(urls.join('\n')));
When the browser asks you to save the file, use the name urls.txt
. Then, use wget
to download all the images in that file:
$ wget --wait=2 --random-wait --input urls.txt
In our case, however, Dan Becker, a Data Scientist at Kaggle, has already created a "Hot Dog" / "Not Hot Dog" dataset and has even divided the photos into train and test folders.
We load the train and test folders into dataloaders, applying various transforms to the dataset:
train_transform = transforms.Compose([
transforms.RandomRotation(30),
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])
test_transform = transforms.Compose([
transforms.Resize(255),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225]),
])
train_dataset = datasets.ImageFolder('data/hotdog/train', transform=train_transform)
test_dataset = datasets.ImageFolder('data/hotdog/test', transform=test_transform)
train_dataloader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_dataset, batch_size=64, shuffle=True)
Choosing between CPU and GPU:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
Transfer learning is the transferring of knowledge learnt in one problem to another different but related problem. In our case, we will use a pre-trained ResNet 18 model as a feature extractor. ResNet 18 is a Convolutional Neural Network designed to categories objects into 1,000 categories such as "accordion", "barber chair", "croquet ball", etc. The pre-trained model we use has been trained on the ImageNet database, a hand-annotated image dataset consisting of 14 million images in 20,000 categories. As you move through the layers of a neural network, the features it learns becomes more complicated. In the later layers, it pieces these features together to learn about specific objects.
Shown on the left are the kernels, and on the right are sets of sample matching images.
To keep the knowledge in the pre-trained ResNet 18 model, we will freeze the weights in all the layers except the final layer. Then, for the final layer, instead of a fully-connected layer that connects 512 neurons (in the penultimate layer) to 1,000 object categories (making a total of 512,000 connections), we are going to use a fully-connected layer that only outputs to 2 object categories (with 1,024 connections), representing "Hot Dog" and "Not Hot Dog".
net = models.resnet18(pretrained=True)
# freeze parameters in all layers
for param in net.parameters():
param.requires_grad = False
# switch the final layer (named "fc") to a fully-connected with 2 outputs
# note: parameters of newly created modules have unfrozen parameters by default
num_features = net.fc.in_features
net.fc = nn.Linear(num_features, 2)
net = net.to(device)
Setting the criterion
and optimizer
:
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
The usual training phase:
net.train() # set network to training phase
epochs = 25
# for each pass of the training dataset
for epoch in range(epochs):
train_loss, train_correct, train_total = 0, 0, 0
# for each batch of training examples
for batch_index, (inputs, labels) in enumerate(train_dataloader):
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad() # zero the parameter gradients
outputs = net(inputs) # forward pass
loss = criterion(outputs, labels) # compare output with ground truth
loss.backward() # backpropagation
optimizer.step() # update network weights
# record statistics
_, preds = torch.max(outputs.data, 1)
train_loss += loss.item()
train_correct += (preds == labels).sum().item()
train_total += len(labels)
# print statistics every 100 batches
if (batch_index + 1) % 1 == 0:
print(f'Epoch {epoch + 1}, ' +
f'Batch {batch_index + 1}, ' +
f'Train Loss: {(train_loss/1):.5f}, ' +
f'Train Accuracy: {(train_correct/train_total):.5f}')
train_loss, train_correct, train_total = 0, 0, 0
Similarly, for the usual evaluation phase:
net.eval() # set network to evaluation phase
test_loss = 0
test_correct = 0
test_total = len(test_dataloader.dataset)
with torch.no_grad(): # detach gradients so network runs faster
# for each batch of testing examples
for batch_index, (inputs, labels) in enumerate(test_dataloader):
inputs, labels = inputs.to(device), labels.to(device)
outputs = net(inputs) # forward pass
# record loss
loss = criterion(outputs, labels)
test_loss += loss.item()
# select largest output as prediction
_, preds = torch.max(outputs.data, 1)
# compare prediction with ground truth and mark as correct if equal
test_correct += (preds == labels).sum().item()
print(f'Test Loss: {(test_loss/len(test_dataloader)):.5f}, ' +
f'Test Accuracy: {(test_correct/test_total):.5f} ' +
f'({test_correct}/{test_total})')
We see that we achieved 87.2% accuracy on the test set.
Now, let's predict the images mentioned at the start:
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
test_dataloader_iterator = iter(test_dataloader)
inputs, _ = test_dataloader_iterator.next()
images = make_grid(inputs[:6], nrow=2)
images = images.numpy().transpose((1, 2, 0))
images = np.clip(std * images + mean, 0, 1)
plt.figure(figsize = (8, 12))
plt.axis('off')
plt.imshow(images)
To generate the predictions, we:
- move the inputs to the GPU where the Neural Network is residing;
- run the inputs through the network to get outputs;
- select the output with the largest number of each image as the prediction; and
- map the predicted label to a human-readable label.
inputs = inputs.to(device)
outputs = net(inputs)
# select largest output as prediction
_, preds = torch.max(outputs.data, 1)
labels_dict = {0: 'Hot Dog', 1: 'Not Hot Dog'}
pred_labels = [labels_dict[pred] for pred in preds[:6].cpu().numpy()]
for i, pred_label in enumerate(pred_labels, 1):
print(f'{i}. {pred_label}')
Bonus: Deploying the Machine Learning Model using Flask¶
Saving/Loading the Neural Network¶
There are two ways to save/load the Neural Network:
# save
torch.save(net, PATH)
# load
net = torch.load(PATH)
net.eval()
and
# save
torch.save(net.state_dict(), PATH)
# load
net = Net()
net.load_state_dict(torch.load(PATH))
net.eval()
The first way is simple and succinct and will save the entire module. However, the data is bound to the specific classes and the exact directory structure, so if any of that changes, the code will break.
The second way is verbose, but more flexible. It saves the state_dict
which is a Python dictionary mapping of each layer to its parameters.
Running Predictions using Flask¶
You can deploy a model using Flask in order to make your prediction model available to the wider world without needing them to install PyTorch and train their own model. We will build a microservice that accepts the image URL in a GET request and return whether that image is "Hot Dog" or "Not Hot Dog". More precisely, we'll be able to execute the following cURL requests to retrieve a prediction:
$ curl http://127.0.0.1:5000/?url=https://actamachina.com/assets/images/06-hot-dog.jpg
"Hot Dog"
$ curl http://127.0.0.1:5000/?url=https://actamachina.com/assets/images/06-not-hot-dog.jpg
"Not Hot Dog"
Contents of predict.py
:
from flask import Flask, request, jsonify
from io import BytesIO
from PIL import Image
from torchvision import transforms
import requests
import torch
app = Flask(__name__)
test_transform = transforms.Compose([
transforms.Resize(255),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225]),
])
@app.route('/')
def predict():
# Flask: open the image URL as a raw image
url = request.args['url']
response = requests.get(url)
raw_img = Image.open(BytesIO(response.content))
# PyTorch: pass the raw img to the network and get predicted label
img = (test_transform(raw_img)
.cuda()
.view(1,3,224,224))
net = torch.load('hotdog.pth')
net.eval()
outputs = net(img)
_, preds = torch.max(outputs.data, 1)
pred = preds.cpu().numpy()[0]
return jsonify('Hot Dog' if pred == 0 else 'Not Hot Dog')
Most of the code in the PyTorch section should look familiar to you. The Flask section retrieves the image URL from the GET request and fetches the raw image.
To run Flask, and start listening for requests at http://127.0.0.1:5000
, run:
$ export FLASK_APP=predict.py
$ export FLASK_ENV=development
$ flask run
When you have tested your Flask code, change the FLASK_ENV to production when you are ready to deploy to production. This turns off the debugger and reloader by default. The Flask documentation walks you through deployment options such as Google App Engine, Heroku or self-hosted options.
Summary and Next Steps¶
In this post, you've learnt how to:
- load a pre-trained model;
- freeze earlier layers; and
- change later layers to obtain the output we want.
In the next post, we'll be looking at image style transfer, i.e. how to take an image and transform it into a specific artistic style.
Acknowledgements¶
This post has been inspired by posts from: