CNN

Notebook

Example Notebook: Kerasy.examples.MNIST.ipynb

CNN is originated from Neocognitron, which was devised based on the neurophysiological knowledge (visual cortex of the living organism's brain), and has a structure specialized for image processing.

Neocognitron consists of:

  • Convolutional Layer : corresponding to simple cells (S-cells) for feature extraction
  • Pooling Layer : corresponding to complex cells (C-cells) having a function of allowing positional deviation.

CNN can learn well by using backpropagation. The algorithm is shown mathematically below.

Convolutional Layer

Mono Multi
Convolution-Mono-cha.png Convolution-Multi-cha.png

forward

forward1.png forward2.png forward3.png
forward4.png forward5.png forward6.png
$$ \begin{cases} \begin{aligned} a_{i,j,c'}^{{k+1}} &= \sum_c\sum_{m=0}^{M-1}\sum_{n=0}^{N-1}w_{m,n,c,c'}^{k+1}z_{i+m,j+n,c}^{k} + b_{c'}^{k+1}\\ z_{i,j,c'}^{{k}} &= h^{k}\left(a_{i,j,c'}^{{k}}\right) \end{aligned} \end{cases} $$

backprop

backprop.gif

Check image individually
backprop1.png backprop2.png backprop3.png backprop4.png
backprop5.png backprop6.png backprop7.png backprop8.png
backprop9.png backprop10.png backprop11.png backprop12.png
backprop13.png backprop14.png backprop15.png backprop16.png
  • $w_{m,n,c,c'}^k, b_{c'}^k$ $$ \begin{aligned} \frac{\partial E}{\partial w_{m,n,c,c'}^{k+1}} &= \sum_{i}\sum_{j}\frac{\partial E}{\partial a_{i,j,c'}^{k+1}}\frac{\partial a_{i,j,c'}^{k+1}}{\partial w_{m,n,c,c'}^{k+1}}\\ &= \sum_{i}\sum_{j}\frac{\partial E}{\partial a_{i,j,c'}^{k+1}}z_{i+m,j+n,c}^{k}\\ &= \sum_{i}\sum_{j}\delta_{i,j,c'}^{k+1}\cdot z_{i+m,j+n,c}^{k}\\ \frac{\partial E}{\partial b_{c'}^{k+1}} &= \sum_{i}\sum_{j}\delta_{i,j,c'}^{k+1} \end{aligned} $$
  • $\delta_{i,j,c}^k$ $$ \begin{aligned} \delta_{i,j,c}^{k} &= \frac{\partial E}{\partial a_{i,j,c}^{k}} \\ &= \sum_{c'}\sum_{m=0}^{M-1}\sum_{n=0}^{N-1}\left(\frac{\partial E}{\partial a_{i-m,j-n,c'}^{k+1}}\right)\left(\frac{\partial a_{i-m,j-n,c'}^{k+1}}{\partial a_{i,j,c}^k}\right)\\ &= \sum_{c'}\sum_{m=0}^{M-1}\sum_{n=0}^{N-1} \left(\delta_{i-m,j-n,c'}^{k+1}\right)\left(w_{m,n,c,c'}^{k+1}h'\left(a_{i,j,c}^k\right)\right) \\ &= h'\left(a_{i,j,c}^k\right)\sum_{c'}\sum_{m=0}^{M-1}\sum_{n=0}^{N-1} \delta_{i-m,j-n,c'}^{k+1}\cdot w_{m,n,c,c'}^{k+1} \end{aligned} $$

Pooling Layer

forward

Max-Pooling-forward.png

backprop

Max-Pooling-backprop.png

ex. MNIST

The MNIST database (Modified National Institute of Standards and Technology database) is a large (60,000 training images and 10,000 testing images) database of handwritten digits that is commonly used for training various image processing systems.

0 1 2 3 4
MNIST sample0 MNIST sample1 MNIST sample2 MNIST sample3 MNIST sample4
5 6 7 8 9
MNIST sample5 MNIST sample6 MNIST sample7 MNIST sample8 MNIST sample9
In [1]:
import numpy as np

from kerasy.datasets import mnist
from kerasy.models import Sequential
from kerasy.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D, Input
from kerasy.utils import CategoricalEncoder
In [2]:
# Datasets Parameters.
num_classes = 10
n_samples = 1_000

# Training Parameters.
batch_size = 16
epochs = 20
keep_prob1 = 0.75
keep_prob2 = 0.5
In [3]:
# input image dimensions
img_rows, img_cols = 28, 28
In [4]:
# the data, split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()
In [5]:
x_train = np.expand_dims(x_train, axis=-1)
x_test  = np.expand_dims(x_test,  axis=-1)
input_shape = (img_rows, img_cols, 1)
In [6]:
x_train = x_train[:n_samples]
y_train = y_train[:n_samples]
x_test = x_test[:n_samples]
y_test = y_test[:n_samples]
In [7]:
x_train = x_train.astype('float64')
x_test = x_test.astype('float64')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
x_train shape: (1000, 28, 28, 1)
1000 train samples
1000 test samples
In [8]:
# convert class vectors to binary class matrices
encoder = CategoricalEncoder()
y_train = encoder.to_onehot(y_train, num_classes)
y_test  = encoder.to_onehot(y_test, num_classes)
Dictionaly for Encoder is already made.
In [9]:
model = Sequential()
model.add(Input(input_shape=input_shape))
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu'))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(keep_prob=keep_prob1))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(keep_prob=keep_prob2))
model.add(Dense(num_classes, activation='softmax'))
In [10]:
model.compile(
    optimizer='adagrad',
    loss='categorical_crossentropy',
    metrics=['categorical_accuracy']
)
/Users/iwasakishuto/Github/portfolio/Kerasy/kerasy/engine/sequential.py:67: UserWarning: Kerasy Warnings
------------------------------------------------------------
When calculating the CategoricalCrossentropy loss and the derivative of the Softmax layer, the gradient disappears when backpropagating the actual value, so the SoftmaxCategoricalCrossentropy is implemented instead.
------------------------------------------------------------
  "so the \033[34mSoftmaxCategoricalCrossentropy\033[0m is implemented instead.\n" + '-'*60)
In [11]:
model.summary()
-----------------------------------------------------------------
Layer (type)                 Output Shape              Param #
=================================================================
input_1 (Input)              (None, 28, 28, 1)         0
-----------------------------------------------------------------
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320
-----------------------------------------------------------------
conv2d_2 (Conv2D)            (None, 24, 24, 64)        18496
-----------------------------------------------------------------
maxpooling2d_1 (MaxPooling2D (None, 12, 12, 64)        0
-----------------------------------------------------------------
dropout_1 (Dropout)          (None, 12, 12, 64)        0
-----------------------------------------------------------------
flatten_1 (Flatten)          (None, 9216)              0
-----------------------------------------------------------------
dense_1 (Dense)              (None, 128)               1179776
-----------------------------------------------------------------
dropout_2 (Dropout)          (None, 128)               0
-----------------------------------------------------------------
dense_2 (Dense)              (None, 10)                1290
=================================================================
Total params: 1,199,882
Trainable params: 1,199,882
Non-trainable params: 0
-----------------------------------------------------------------
In [12]:
model.fit(
    x_train, y_train,
    batch_size=batch_size,
    epochs=epochs,
    verbose=1,
    validation_data=(x_test, y_test)
)
Epoch 01/20 | 63/63[####################]100.00% - 1442.691[s]   categorical_crossentropy: 1381.319, categorical_accuracy: 54.4%, val_categorical_crossentropy: 740.757, val_categorical_accuracy: 78.9%
Epoch 02/20 | 63/63[####################]100.00% - 1439.734[s]   categorical_crossentropy: 573.693, categorical_accuracy: 83.7%, val_categorical_crossentropy: 584.554, val_categorical_accuracy: 86.2%
Epoch 03/20 | 63/63[####################]100.00% - 1318.618[s]   categorical_crossentropy: 405.320, categorical_accuracy: 88.2%, val_categorical_crossentropy: 418.508, val_categorical_accuracy: 90.4%
Epoch 04/20 | 63/63[####################]100.00% - 1010.807[s]   categorical_crossentropy: 318.465, categorical_accuracy: 90.0%, val_categorical_crossentropy: 440.983, val_categorical_accuracy: 90.1%
Epoch 05/20 | 63/63[####################]100.00% - 1020.449[s]   categorical_crossentropy: 261.625, categorical_accuracy: 92.6%, val_categorical_crossentropy: 496.641, val_categorical_accuracy: 89.6%
Epoch 06/20 | 63/63[####################]100.00% - 1018.260[s]   categorical_crossentropy: 215.082, categorical_accuracy: 92.9%, val_categorical_crossentropy: 419.564, val_categorical_accuracy: 92.7%
Epoch 07/20 | 63/63[####################]100.00% - 1013.837[s]   categorical_crossentropy: 168.225, categorical_accuracy: 95.0%, val_categorical_crossentropy: 461.159, val_categorical_accuracy: 91.4%
Epoch 08/20 | 63/63[####################]100.00% - 1013.345[s]   categorical_crossentropy: 164.291, categorical_accuracy: 95.1%, val_categorical_crossentropy: 479.793, val_categorical_accuracy: 91.0%
Epoch 09/20 | 63/63[####################]100.00% - 1021.595[s]   categorical_crossentropy: 134.688, categorical_accuracy: 96.0%, val_categorical_crossentropy: 450.356, val_categorical_accuracy: 91.4%
Epoch 10/20 | 63/63[####################]100.00% - 1012.004[s]   categorical_crossentropy: 132.845, categorical_accuracy: 96.3%, val_categorical_crossentropy: 449.674, val_categorical_accuracy: 92.1%
Epoch 11/20 | 63/63[####################]100.00% - 1015.242[s]   categorical_crossentropy: 113.595, categorical_accuracy: 96.2%, val_categorical_crossentropy: 480.528, val_categorical_accuracy: 92.1%
Epoch 12/20 | 63/63[####################]100.00% - 1015.592[s]   categorical_crossentropy: 93.535, categorical_accuracy: 96.8%, val_categorical_crossentropy: 497.775, val_categorical_accuracy: 91.7%
Epoch 13/20 | 63/63[####################]100.00% - 1021.730[s]   categorical_crossentropy: 95.980, categorical_accuracy: 96.9%, val_categorical_crossentropy: 477.428, val_categorical_accuracy: 91.8%
Epoch 14/20 | 63/63[####################]100.00% - 1020.249[s]   categorical_crossentropy: 68.847, categorical_accuracy: 98.3%, val_categorical_crossentropy: 520.389, val_categorical_accuracy: 91.4%
Epoch 15/20 | 63/63[####################]100.00% - 1020.915[s]   categorical_crossentropy: 84.820, categorical_accuracy: 96.9%, val_categorical_crossentropy: 517.873, val_categorical_accuracy: 91.9%
Epoch 16/20 | 63/63[####################]100.00% - 1019.250[s]   categorical_crossentropy: 70.326, categorical_accuracy: 97.7%, val_categorical_crossentropy: 509.199, val_categorical_accuracy: 92.7%
Epoch 17/20 | 63/63[####################]100.00% - 1012.677[s]   categorical_crossentropy: 71.140, categorical_accuracy: 97.4%, val_categorical_crossentropy: 475.805, val_categorical_accuracy: 92.6%
Epoch 18/20 | 63/63[####################]100.00% - 1013.460[s]   categorical_crossentropy: 71.360, categorical_accuracy: 98.0%, val_categorical_crossentropy: 548.380, val_categorical_accuracy: 92.2%
Epoch 19/20 | 63/63[####################]100.00% - 1043.271[s]   categorical_crossentropy: 65.436, categorical_accuracy: 97.8%, val_categorical_crossentropy: 480.036, val_categorical_accuracy: 92.5%
Epoch 20/20 | 63/63[####################]100.00% - 1163.611[s]   categorical_crossentropy: 47.725, categorical_accuracy: 98.5%, val_categorical_crossentropy: 519.467, val_categorical_accuracy: 92.3%
In [13]:
model.save_weights("MNIST_example_notebook_adagrad.pickle")