Handwritten Digit Recognition with Python & CNN

by TechVidvan Team

Hello friends, ‘Digits’ are a part of our everyday life, be it License plate on our cars or bike, the price of a product, speed limit on a road, or details associated with a bank account. In the case of a text which is unclear, it is easier to guess the digits in comparison to the alphabets

Machine Learning and Deep Learning are reducing human efforts in almost every field. Moreover, a solution achieved using ML and DL can power various applications at the same time, thereby reducing human effort and increasing the flexibility to use the solution. One such solution is a handwritten digit recognition system that can be used in postal mail sorting, bank check processing, form data entry, etc.

Convolution Neural Network

A Convolutional Neural Network or CNN is a Deep Learning Algorithm which is very effective in handling image classification tasks. It is able to capture the Temporal and Spatial dependencies in an image with the help of filters or kernels.

The kernel is just like a small window sliding over the large window in order to extract the spatial features and in the end, we get feature maps.

MNIST Dataset

Source: MNIST

We are going to use the famous MNIST dataset for training our CNN model. The MNIST dataset was compiled with images of digits from various scanned documents and then normalized in size. Each image is of a dimension, 28×28 i.e total 784 pixel values.

You do not need to download the dataset from any external source as we will import it from keras.datasets

Layout of the basic idea

Firstly, we will train a CNN (Convolutional Neural Network) on MNIST dataset, which contains a total of 70,000 images of handwritten digits from 0-9 formatted as 28×28-pixel monochrome images.
For this, we will first split the dataset into train and test data with size 60,000 and 10,000 respectively.
Then, we will preprocess the input data by reshaping the image and scaling the pixel values between 0 and 1.
After that, we will design the neural network and train the model.
After the model is trained, we will save it for future use.
Next, we are going to use a webcam as an input to feed an image of a digit to our trained model.
Our model will process the image to identify the digit and return a series of 10 numbers corresponding to the ten digits with an activation on the index of the proposed digit.

Download Handwritten Digit Recognition Code

Please download project source code: Handwritten Digit Recognition in Python

File Structuring

1. Train.py

We utilize the MNIST dataset to train our CNN model and then save the model in the current working directory.

2. RecognizeDigit.py

We load the saved model and use appropriate functions to capture video via webcam and pass it as an input to our model. Our model produces a prediction which is displayed to the user.

Libraries Required

Make sure that the following libraries are installed on your working machine before proceeding further

Keras
Tensorflow
OpenCV
Sklearn
Numpy

Training the Model (Train.py)

Before we begin training, I would suggest you to train the model on Google colab as it offers training the model on GPU if your computer does not have one. It speeds up the training process by manifold and helps you achieve the final results much quicker.

Simply open a Google Colab Notebook > Edit > Notebook Settings > Hardware Accelerator > GPU > Save and Done..!!

1. Import the necessary libraries and modules

import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K

2. Splitting the MNIST dataset into Train and Test

(x_train, y_train), (x_test, y_test) = mnist.load_data()

3. Preprocessing the input data

num_of_trainImgs = x_train.shape[0] #60000 here
num_of_testImgs = x_test.shape[0] #10000 here
img_width = 28
img_height = 28
 
x_train = x_train.reshape(x_train.shape[0], img_height, img_width, 1)
x_test = x_test.reshape(x_test.shape[0], img_height, img_width, 1)
input_shape = (img_height, img_width, 1)
 
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255

4. Converting the class vectors to binary class

num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

5. Defining the model architecture

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

6. Compiling the model

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

7. Fitting the model on training data

model.fit(x_train, y_train,
          batch_size=128,
          epochs=12,
          verbose=1,
          validation_data=(x_test, y_test))

Output :

8. Evaluating the model on test data

score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

Output :

You might have observed that with the training, our test loss decreased significantly as we ran our model for 30 epochs and accuracy improved to over 89%. I may not sound like a good figure but let’s test out our model on the real-world input.

9. Saving the Model

model.save('trained_model.h5')

Note : If you trained your model on Google Colab, then make sure you download the model in the project directory.

Digit Recognition

1. Importing the necessary libraries

import numpy as np
import cv2
from skimage import img_as_ubyte    
from skimage.color import rgb2gray
from keras.models import load_model

2. Setting up the videoCapture

width = 640
height = 480
cameraNo = 0
 
cap = cv2.VideoCapture(cameraNo)
cap.set(3,width)
cap.set(4,height)

3. Loading our pretrained model

model = load_model('trained_model.h5')

Note : Steps D to N will be in the infinite while loop

4. Reading the Image

while True:
  success, im_orig = cap.read()

5. Converting the image to grayscale

img_gray = rgb2gray(img_original)

6. Converting the result to uint8 range

img_gray_u8 = img_as_ubyte(img_gray)

7. Thresholding

(thresh, im_binary) = cv2.threshold(img_gray_u8, 128, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)

8. Resizing the image

img_resized = cv2.resize(im_binary,(28,28))

9. Inverting the image colours

im_gray_invert = 255 - img_resized
  cv2.imshow("invert image", im_gray_invert)

10. Reshaping the image for final transmission

im_final = im_gray_invert.reshape(1,28,28,1)

11. Transmitting the image to our model

ans = model.predict(im_final)

12. Extracting the result from the array returned and printing the predicted value

ans = np.argmax(ans,axis=1)[0]
  print(ans)

13. Putting the predicted value as a text on webcam feed

cv2.putText(img_original,'Predicted Digit : '+str(ans),
                    (50,50),cv2.FONT_HERSHEY_COMPLEX,
                    1,(0,0,255),1)
  cv2.imshow("Original Image",img_original)

14. Handling the exit

if cv2.waitKey(1) and 0xFF == ord('q'):
    break

15. Releasing the camera control and destroying all the windows

cap.release()
cv2.destroyAllWindows()

Plotting the Collage of Images of Digits from Dataset

Just in case, if you are curious and do not know how I made the above collage of images from the train dataset, let me show

(x_train, y_train),(x_test, y_test) = mnist.load_data()
 
import matplotlib.pyplot as plt
fig, axes = plt.subplots(10, 10, figsize=(8, 8), subplot_kw={'xticks':[], 'yticks':[]}, gridspec_kw=dict(hspace=0.1, wspace=0.1))
for i, ax in enumerate(axes.flat):
    ax.imshow(x_train[i], cmap='binary', interpolation='nearest')
    ax.text(0.05, 0.05, str(y_train[i]),transform=ax.transAxes, color='green')
plt.show()

Explanation:

Just after we load our data via mnist.load_data(), we need to import matplotlib. The image we see is the collection of various subplots hence we define a 10×10 subplot, meaning there are 100 images to be accommodated in the plot. You can see we have disabled both the xticks and yticks. In order to relate the image to its target we value, we also put a small text in the bottom left corner of the image. Gridspec_kw basically helps specify the gaps in between the plots, both horizontally and vertically. In the end, we display the plot using plot.show() method.

Summary

Hooray..!! You have successfully made a handwritten digit recognition system. Honestly, the intention was to make it work on real-life data, apart from the test dataset. Hence, you built something different from the usual tutorials. You can extend this project by adding the functionality of multi-digit recognition or you can completely create a new project from scratch. In this new project, you can ask the user to draw the digits with gestures and then detect them. Happy coding and all the best for great projects ahead.