Models of Neural Networks

Convolutional Neural Networks

Convolution is one of the most famous methods for processing images.

It's a way of reducing the chart of numbers of filtering them. The Deep Learning Scientists contemplated for years and years about how to utilize convolution to deep learning and in 1998 Dr. Yann LeCun publishes an unprecedented research paper of convolution and deep learning.

He named such a revolutionary network the convolution neural networks (CNN). No doubt, after CNN deep learning reached its peak and opened the golden era. Later on, CNN was utilized in recognizing faces, objects, backgrounds, and more. It is the most widespread method of recognizing figures today.

Filter, Strides, and Padding

Let's learn about the most important factor in CNN: the convolution layers!

If there is a 5x5 input, and we use a 3x3 FILTER, we will get a result of a 3x3 feature map: we move the filter by one unit at each time, as you can see, and that measure of movement is a STRIDE.

However, this method greatly reduces the output feature map, from 5x5 to a 3x3 map. So we create a margin or so-called PADDING so that the input map and the output map have the same volume.

The dotted lines represent the margins which allow the output to be the same volume.

We only used one filter for the example above, but we can also use multiple filters at once, too. And it would graphically be a 3 dimensional, or even more complex form of a map.

In the image above, we have the input as 10x10x3, and the filter is 4x4x3. however, this time there are two filters. Then, the resulting map is a 3-dimensional map of 10x10x2. Of course, the padding was implemented to make the 10x10 of the 10x10x2 map.

Structure of CNN

The Convolution Neural Network (CNN) uses both the convolution layer and the dense layer.

As shown in the image above, the REPETITION OF CONVOLUTION LAYERS + ACTIVATION FUNCTIONS + AND POOLING LAYERS pulls out the crucial information and making the map smaller and smaller, making it easier for machines to educate. The pooling layers play a significant role, EXTRACTING IMPORTANT PARTS FROM A SPECIFIC MAP AND STORING THEM.

Max Pooling

The image below is an example of MAX POOLING. It's a 2x2 pool size with 2 strides, as it goes through the pool it extracts the MAXIMUM VALUE within the 2x2 pool filter.

Average Pooling

The image below is an example of AVERAGE POOLING. 2x2 average pooling extracts the average value within the 2x2 pooling map.

Comparison between max pooling and average pooling

Back to this image again, after the second pooling layer, it has to be converted into the fully connected layers, which is one-dimensional data while the output of the pooling layer is two-dimensional data. So the converting isn't feasible.

So we use the FLATTEN LAYERS to reduce the dimensions of certain datasets, the image below is a simple illustration of how it works.

After such processing, we are able to calculate the one-dimensional data within the dense layer (fully connected layers) and again the repetition of THE DENSE LAYER + ACTIVATION FUNCTIONS reduces the nodes' numbers, and at the end, using the softmax function we output the result into the output layer.

Usage of CNN

Computer's Vision

In order for the computer to detect certain objects and identify the object's identity, it needs to be able to do two things in general: OBJECT DETECTION and SEGMENTATION.

Object detection

It literally means detecting a certain object and pulling only the detected data from the general view. The iPhone's face detection and how the camera finds your face automatically is a great example of object detection.

YOLO (You Look Only Once)

YOLO is one of the most common modules used in object detection.

Right now, there is V5 (version 5) and its pros are that it is very fast, and compared to other real-time detection it has high accuracies. It is known for its quickness and preciseness, as it is the renowned Computer Vision Algorithm.

Segmentation

Segmentation is dividing a certain object's pixels from others' pixels. The more precise the measurement of such division, the more complex form of method it requires and it may affect the time it takes to process segmentation.

You can specify the class of segmentation into animals, and into cats and dogs. If you go deeper and divide further, it can detect between certain types of dogs and cats.

The famous example is blurring out the background by segmenting a person and blurring the rest's pixels.

Also in medical fields, it is used to detect the infected, or the disabled parts of our brains and segmenting them to have further actions.

Various examples of CNN

Autopilot

Pose Detection

Super Resolution

Style Transfer

Colorization

Different types of CNN

There are a lot of types for CNN which contributed to the foundation of Deep Learning. For this section, we'll just go over the few.

You might think here that the higher the accuracy the better, but for those modules, it requires a high-tech computer with other features in order to operate.

#AlexNet (2012)

AlexNet is one of the starters of CNN. It won the ILSVRC(ImageNet Large-Scale Visual Recognition Challenge) in 2012, beating the second team by over 10% of error occur rate. It is the first CNN that had such an impact that people got to know about image processing through machines. It used dropout and image augmentation, effectively educating the model to predict which image is which.

VGGNet (2014)

There is no particular uniqueness in this model, but it's known for its deepness. Also, this is the very model engineers use to test their datasets before constructing their models.

GoogleLeNet (=Inception V3) (2015)

Both VGGNet and GoogleLeNet are the offsprings of AlexNet. It is known for its deep neural network, however, despite such deepness, it does not carry many parameters. It allows machines to view different parameters as if humans perceive them.

ResNet (2015)

It is one of the most useful and advanced models. It is often used when the training error keeps decreasing while the test error remains the same. It fluently uses backpropagation in order to level out such error factors.

Transfer Learning

Transfer Learning resembles the way human learns. For instance, if I already know English, when learning French, I apply the knowledge I already know into learning french, practically trying to maximize the result. Transfer Learning does the exact same thing. It applies prior models in order to maximize productivity.

Transfer learning will be the next driver of ML commercial success after supervised learning.

Andrew Ng, Baidu Research

The Interesting fact about transfer learning is that totally different models can actually affect the learning of a novel dataset. For example, taking a model which learned how to recognize animal faces can be useful in face detection as well.

Recurrent Neural Networks (RNN)

RNN is especially flexible in terms of inputs and outputs. The input may be as long as 100 characters, but as short as 1 character and it will still work.

Below is an example of how RNN outputs hello with hell as its input. Like this, RNN is used in many fields to read texts and output certain letters or summaries.

Generative Adversarial Networks (GAN)

It is implementing two adversarial models at the same time. It is a very concentrated field in machine learning.

So like in this case, putting a generator function that creates a fake dollar that mostly resembles the actual one, and putting a discriminator which discriminates which is an authentic dollar, would boost up the general productivity of a model.

Usually, well, not in the case above, but most of the time the focus is on how to educate the generator better.

GAN example

The more confused the discriminator is about the actual noise and the fake noise the generator created, the better the generator. If the discriminator was able to find out which was generated and which was fake, the better the discriminator.

You can observe here how the generator is creating more and more precise images of an animal as the epoch increases.

CycleGAN

StarGAN

CartoonGAN

DeepFake

BeautyGAN

Toonify Yourself

CNN code

First you need to change the runtime to GPU

import os
os.environ['KAGGLE_USERNAME'] = 'ericshindev' # username
os.environ['KAGGLE_KEY'] = '7e8c38399867d7c318135c93b4dbb1e9' # key

!kaggle datasets download -d datamunge/sign-language-mnist

!unzip sign-language-mnist.zip

Loading packages

from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Conv2D, MaxPooling2D, Flatten, Dropout # input layer, dense layer, conv2d layer, and more
from tensorflow.keras.optimizers import Adam, SGD
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import OneHotEncoder

Loading dataset

train_df = pd.read_csv('sign_mnist_train.csv')

train_df.head()

test_df = pd.read_csv('sign_mnist_test.csv')

test_df.head()

Distributing Labels

plt.figure(figsize=(16, 10))
sns.countplot(train_df['label'])
plt.show()

Preprocessing

train_df = train_df.astype(np.float32)
x_train = train_df.drop(columns=['label'], axis=1).values
x_train = x_train.reshape((-1, 28, 28, 1))
y_train = train_df[['label']].values

test_df = test_df.astype(np.float32)
x_test = test_df.drop(columns=['label'], axis=1).values
x_test = x_test.reshape((-1, 28, 28, 1))
y_test = test_df[['label']].values

print(x_train.shape, y_train.shape)
print(x_test.shape, y_test.shape)

Previewing data

index = 1
plt.title(str(y_train[index]))
plt.imshow(x_train[index].reshape((28, 28)), cmap='gray')
plt.show()

One-hot encoding

encoder = OneHotEncoder()
y_train = encoder.fit_transform(y_train).toarray()
y_test = encoder.fit_transform(y_test).toarray()

print(y_train.shape)

Normalization

train_image_datagen = ImageDataGenerator(
  rescale=1./255, # normalization
)

train_datagen = train_image_datagen.flow(
    x=x_train,
    y=y_train,
    batch_size=256,
    shuffle=True
)

test_image_datagen = ImageDataGenerator(
  rescale=1./255
)

test_datagen = test_image_datagen.flow(
    x=x_test,
    y=y_test,
    batch_size=256,
    shuffle=False
)

index = 1

preview_img = train_datagen.__getitem__(0)[0][index]
preview_label = train_datagen.__getitem__(0)[1][index]

plt.imshow(preview_img.reshape((28, 28)))
plt.title(str(preview_label))
plt.show()

Constructing Network

input = Input(shape=(28, 28, 1))

hidden = Conv2D(filters=32, kernel_size=3, strides=1, padding='same', activation='relu')(input)
hidden = MaxPooling2D(pool_size=2, strides=2)(hidden)

hidden = Conv2D(filters=64, kernel_size=3, strides=1, padding='same', activation='relu')(hidden)
hidden = MaxPooling2D(pool_size=2, strides=2)(hidden)

hidden = Conv2D(filters=32, kernel_size=3, strides=1, padding='same', activation='relu')(hidden)
hidden = MaxPooling2D(pool_size=2, strides=2)(hidden)

hidden = Flatten()(hidden)

hidden = Dense(512, activation='relu')(hidden)

hidden = Dropout(rate=0.3)(hidden)

output = Dense(24, activation='softmax')(hidden)

model = Model(inputs=input, outputs=output)

model.compile(loss='categorical_crossentropy', optimizer=Adam(lr=0.001), metrics=['acc'])

model.summary()

Educate

history = model.fit(
    train_datagen,
    validation_data=test_datagen, 
    epochs=20 
)

Graphing results

fig, axes = plt.subplots(1, 2, figsize=(20, 6))
axes[0].plot(history.history['loss'])
axes[0].plot(history.history['val_loss'])
axes[1].plot(history.history['acc'])
axes[1].plot(history.history['val_acc'])

Data augmentation

train_image_datagen = ImageDataGenerator(
  rescale=1./255, # 일반화
  rotation_range=10,  # randomly rotates image
  zoom_range=0.1, # randomly shrinks or expands image
  width_shift_range=0.1,  # randomly shift image in its width
  height_shift_range=0.1,  # randomly shift image in its height
)

train_datagen = train_image_datagen.flow(
    x=x_train,
    y=y_train,
    batch_size=256,
    shuffle=True
)

test_image_datagen = ImageDataGenerator(
  rescale=1./255
)

test_datagen = test_image_datagen.flow(
    x=x_test,
    y=y_test,
    batch_size=256,
    shuffle=False
)

index = 1

preview_img = train_datagen.__getitem__(0)[0][index]
preview_label = train_datagen.__getitem__(0)[1][index]

plt.imshow(preview_img.reshape((28, 28)))
plt.title(str(preview_label))
plt.show()

Machine Learning - Final Week