## What is Deep Learning ?

Actually deep learning is a branch of machine learning. Machine learning includes some different types of algorithms which get a few thousands data and try to learn from them in order to predict new events in future. But deep learning applies neural network as extended or variant shapes. Deep learning has a capacity of handling million points of data.

The most fundamental infrastructure of deep learning could be; its ability to pick the best features. Indeed, deep learning summarized data and compute the result based on compressed data. It is what is really needed in artificial intelligence, especially when we have huge data base with dramatically computation.

Deep learning has sequential layers which is inspired from neural network. These layers have nonlinear function with the duty of feature selection. Each layer has an output which will be used as input for next layers. Deep learning applications are computer vision (such as face or object recognition), speech recognition, natural language process(NLP) and cyber threat detection.

## Deep Learning vs Machine Learning

The major differences between machine learning and deep learning is that; in ML we need to **human manual intervention to select feature** extraction while in DL it will be done by its** intuitive knowledge **which has been embede inside its architecture. This differences make a dramatically influence in their performance either in precision or speed. Because there are always **human error in manually feature detection**, therefore DL can be best option for giantic data computation.

The common factor between DL and ML is that both of them are working in supervised and unsupervised. DL is just based on NN while changes its shape and operation in CNN - RNN andd etc. But ML has different algorithms which are based on statistical and mathematical science. Although it doesn't mean that DL is merely on neural network, DL can also uses of various ML algorithms in order to increase performance by making hybrid functions. For instance DL can apply Support Vector Machine (SVM) as its own activation function instead of softmax. [1]

## Feature Engineering Importance

We try to make machine as an independent tool in artificial intelligence to think which needs less programmer intervention. The most characteristic of an automate machine is; the way he thinks, if his way of thinking has the most similarity to human brain so he will win in the race of best machine. So let’s to see what is the pillar attribute in making accurate decision. Remember our childhood, when we saw objects but we had no idea about their properties such has name, exact size, weight and so on. But we could categorize them quickly by noticing one important things. For example, by looking at one animal we noticed that it is "Dog" as soon as we heard is sound which is "barking" or we noticed it is "Cat" when we heard its "meowing". So here animal sound has a most effective influence rather than size because as experience when we see animal with similar size to other animal our brain starts to pay attention the most distinguish feature which is sound. On the other hand, when we see the most taller animal in zoo we ignore all of other features and we say “Yes, it is giraffe”.

It is a miracle in brain because it can inference situation and according different condition in same problem such as “animal detection” make one feature as his final key to make decision according to that and given result by this attitude will be accurate and also quickly. Another story to make clear the feature engineering importance is “Twenty Questions Game” if you did not play it till now please look at: __ here__

The player will win if has the ability to ask proper question and according to the recent answers he should make and improve the next question. The questions are sequentially and the next question is 100% depends on previous answer. Previous answers have the duties to make filtration ad clarification for player to reach the goal. Each question is as a hidden layer in neural network which are connected to the next layers and their output will be used as input for the next layers. Our first question always starts as “Is it alive?” and by this question we remove half of possibilities. This omitting and dropping lead us to asking better question in new category, obviously we cannot ask the next one without previous answer which made a clarification and filtration in our brain. This story happens somehow in deep learning convolutional neural network.

## Deep Learning and human brain

Deep learning is an imitation of human brain with almost in the aspect of precision and speed. Convolutional Neural Networks (CNN) is inspired from brain cortex. AS you see in below picture visual cortex layer has covered all of entire visual field. These sensitive cells have the role of kernel or filter matrix which we will pay attention to them later in this article. God created these cells to extract important data which are coming from eyes.

Assume students have exam and they are preparing themselves, they start to read the book while they pick up important part of book and write it on notes or by highlighting them. In both they tend to reduce the volume of book and summarized 100 pages into two pages which are easily to use it as reference and review it. The similar scenario happens on DL CNN, this time we need a smaller matrix to filter and remove data.

## Requierment:

I strongly recommend and please you to read carefully the first and second below articles, because their concept will be needed and I assumed that you know everything about linear regression and neural network.

## How Deep Learning - Convolutional Neural Network Works?

Deep learning is neural network which has more than two hidden layers. Please if you are new in neural network study this link. There are more data because of more layers which causes overfitting. Overfitting happens when we made our model from training data set as really complete and match to test set and always there is one answer inside model. One of the good characteristic of model is to be generalized not to be complete coincident.

We cannot or even we can it is wrong to make a complete model. Let’s see wat happens when we want to assign an “Y” inside our model. We must ignore to be too much idealistic in making model and tend to make it general rather than specifically, in order to reach this point, we can apply cross validation. Cross validation is model evaluation method. The best way is using K-fold cross validation which tries to divide train set to k parts and in each iteration, k is belong to test and the rest of k-1 is train set, therefore the chance of matching will be decreased. There are some specific solutions instead of K-fold cross validation in convolutional neural network in order to avoid overfitting such as **drop out **and **regularization.**

Fully connected in DL means that each neuron in one hidden layer has connection to all of neurons to the next layer. In the case of applying drop out in training time some of the neurons will be turned off and after finishing training on the prediction time all neurons will be turned on. So DL tries to omit and remove redundant data and obscure their role and enhance and bold the role of important features. Such as below picture when left picture has high resolution but within passing time DL CNN tries to keep on important pixel and make its smaller.

Assume students have exam and they are preparing themselves, they start to read the book while they pick up important part of book and write it on notes or by highlighting them. In both they tend to reduce the volume of book and summarized 100 pages into two pages which are easily to use it as reference and review it. The similar scenario happens on DL CNN, this time we need a smaller matrix to filter and remove data.

We can transform data to smaller data -which is easier to rely on it for making decision- with the aid of smaller matrix and rotating all over of original and primitive matrix. We do some mathematical calculation by moving filter matrix around primitive matrix. For example, in below picture 12 data points will be reduced to just 3 data points by rotating one matrix 3 times in all over of primitive matrix. These computation can be maximized or taking average of data.

**One CNN Dimensional**

There is no such as one dimensional matrix in real world but because of presenting its way I prefer to start with 1D Matrix. I want to make dimensional reduction with the aid of red matrix on blue matrix. So blue matrix is real data set and red one is filter matrix. I want to transform blue matrix with 5 elements to 3 elements. I push red matrix from left to the right (I push it in each step just one element). Whenever there are coincident I multiply two related elements and in the case of more than one matching elements, I sum up them together. As a notice red matrix was [2 -1 1] and after flip it (kernel) becomes [1 -1 2].

To reduce matrix, I am looking for valid results and they happen when all of red or filter elements are covered by blue one. I just pick up [3 5]

import numpy as np
x = np.array([0,1,2,3])
w = np.array([2,-1,1])
result = np.convolve(x,w)
result_Valid = np.convolve(x,w, "valid")
print(result)
print(result_Valid)

### Two CNN Dimensional

There is similar story in two dimensional matrixes. The kernel matrix [[-1, 0], [2, 1]] will be changed [[1, 2], [0, -1]] to after flipping. Because in all steps in below pictures filter matrix is inside original train matrix, so all of commutated elements are valid.

from scipy import signal as sg
print(sg.convolve([[2, 1, 3],
[5, -2, 1],
[0, 2, -4]], [[-1, 0],[2, 1]]))
print(sg.convolve([[2, 1, 3],
[5, -2, 1],
[0, 2, -4]], [[-1, 0],[2, 1]], "valid"))

## Deep Learning Code Sample by Digit Recognition

I want to introduce you best competition community __KAGGLE __which is famous around data scientist. There are many competitions which are worthy to practice your abilities in machine learning and deep learning. Also there are awards or whoever can accomplish code for recent challenges. There are kernels which have been written by authors and also you can contribute on those and they are good sources for learning artificial intelligence in R and Python. Moreover, you can use its data set as reference and test your code with prepared data.

I want to practice convolutional please click here.

#### Download training and test data set

Please Go to this link to get training and testing data set. Obviously you must sign up on kaggle site and then try to join this competition.

"""
Created on Sun Nov 19 05:59:50 2017
@author: Mahsa
"""
import numpy as np
from numpy.random import permutation
import pandas as pd
import tflearn
from tflearn.layers.core import input_data,dropout,fully_connected,flatten
from tflearn.layers.conv import conv_2d,max_pool_2d
from tflearn.layers.normalization import local_response_normalization
from tflearn.layers.estimator import regression
from sklearn.cross_validation import train_test_split
train_Path = r'D:\digit\train.csv'
test_Path = r'D:\digit\test.csv'
def split_matrices_into_random_train_test_subsets(train_Path):
train = pd.read_csv(train_Path)
train = np.array(train)
train = permutation(train)
X = train[:,1:785].astype(np.float32)
y = train[:,0].astype(np.float32)
return train_test_split(X, y, test_size=0.33, random_state=42)
def reshape_data(Data,Labels):
Data = Data.reshape(-1,28,28,1).astype(np.float32)
Labels = (np.arange(10) == Labels[:,None]).astype(np.float32)
return Data,Labels
X_train, X_test, y_train, y_test = split_matrices_into_random_train_test_subsets(train_Path)
X_train,y_train = reshape_data(X_train,y_train)
X_test,y_test = reshape_data(X_test,y_test)
test_x = np.array(pd.read_csv(test_Path))
test_x = test_x.reshape(-1,28,28,1)
def Convolutional_neural_network():
network = input_data(shape=[None,28,28,1],name='input_layer')
network = conv_2d(network, nb_filter=6, filter_size=6, strides=1, activation='relu', regularizer='L2')
network = local_response_normalization(network)
network = conv_2d(network, nb_filter=12, filter_size=5, strides=2, activation='relu', regularizer='L2')
network = local_response_normalization(network)
network = conv_2d(network, nb_filter=24, filter_size=4, strides=2, activation='relu', regularizer='L2')
network = local_response_normalization(network)
network = fully_connected(network, 128, activation='tanh')
network = dropout(network, 0.8)
network = fully_connected(network, 256, activation='tanh')
network = dropout(network, 0.8)
network = fully_connected(network, 10, activation='softmax')
sgd = tflearn.SGD(learning_rate=0.1,lr_decay=0.096,decay_step=100)
top_k = tflearn.metrics.top_k(3)
network = regression(network, optimizer=sgd, metric=top_k, loss='categorical_crossentropy')
return tflearn.DNN(network, tensorboard_dir='tf_CNN_board', tensorboard_verbose=3)
model = Convolutional_neural_network()
model.fit(X_train, y_train, batch_size=128, validation_set=(X_test,y_test), n_epoch=1, show_metric=True)
P = model.predict(test_x)
index = [i for i in range(1,len(P)+1)]
result = []
for i in range(len(P)):
result.append(np.argmax(P[i]).astype(np.int))
res = pd.DataFrame({'ImageId':index,'Label':result})
res.to_csv("sample_submission.csv", index=False)

## Increase deep learning performance with hardware by GPU

One common important factor among gamer developer, graphic designer and data scientist is matrices. Every data point either in images, video or complex data has a value in matric element. Whatever we do includes some mathematical operation to transforming matrices.

For usual processing Central Processing Unit is good answer, but in advanced mathematical and statistical operations with huge data CPU cannot tolerate and we have to use Graphics Processing unit (GPU) which was designed for mathematical difficult function. Because deep learning includes functions which needs complex computation such as convolution neural network, activation function , sigmoid softmax and Fourier Transform will be processed on GPU and the rest of other 95% will be moved on CPU which or mostly I/O procedures.

#### GPU Activation

- Open start and bring "
**windows comand prompt cmd**". - Type "
**dxdiag**" - On the opening window look at "
**Display Tab**" - If name is equal to "
**NVIDIA**" or (NVIDIA GPU - AMD GPU - Intel Xeon Phi) other company, means that there is GPU card on the board. - Lets try to set configuration .theanorc on the "C:\users\"yourname"\".theanorc "
- Set { device =
**gpu **or **cuda0 **, floatX = **float32 **} in **[global]** section, and preallocate = 1 in **[gpuarray]** - If you want to know more about it please look at here.

##### GPU Test Code

import os
import shuti
destfile = "/home/ubuntu/.theanorc"
open(destfile, 'a').close()
shutil.copyfile("/mnt/.theanorc", destfile)
from theano import function, config, shared, sandbox
import theano.tensor as T
import numpy
import time
vlen = 10 * 30 * 768
iters = 1000
rng = numpy.random.RandomState(22)
x = shared(numpy.asarray(rng.rand(vlen), config.floatX))
f = function([], T.exp(x))
print(f.maker.fgraph.toposort())
t0 = time.time()
for i in xrange(iters):
r = f()
t1 = time.time()
print("Looping %d times took %f seconds" % (iters, t1 - t0))
print("Result is %s" % (r))
if numpy.any([isinstance(x.op, T.Elemwise) for x in f.maker.fgraph.toposort()]):
print('Used the cpu')
else:
print('Used the gpu')

## Increase deep learning performance with software libraries

In order to enhance the CNN performances and also because it is not possible to shocked CPU or even GPU with gigantic data more than terabyte, we must use some strategies to break down data manually in some chunks for processing. I have used DASK to prevent out of ram memory crashes. It is responsible or time scheduling.

import dask.array as da
X = da.from_array(np.asarray(X), chunks=(1000, 1000, 1000, 1000))
Y = da.from_array(np.asarray(Y), chunks=(1000, 1000, 1000, 1000))
X_test = da.from_array(np.asarray(X_test), chunks=(1000, 1000, 1000, 1000))
Y_test = da.from_array(np.asarray(Y_test), chunks=(1000, 1000, 1000, 1000))

## References

[1] http://deeplearning.net/wp-content/uploads/2013/03/dlsvm.pdf

[2] https://leonardoaraujosantos.gitbooks.io

[3] https://github.com/Hassankashi?tab=repositories

[4] http://timdettmers.com/2015/07/27/brain-vs-deep-learning-singularity/

[5] https://blog.dominodatalab.com/gpu-computing-and-deep-learning/

[6] http://deeplearning.net/software_links/

[7] https://www.codeproject.com/Articles/1158306/Theano-Machine-Learning-on-a-GPU-on-Windows

[8] https://www.analyticsvidhya.com/blog/2015/02/avoid-over-fitting-regularization/

[9] https://github.com/tflearn/tflearn/tree/master/examples

## Feedback

Feel free to leave any feedback on this article; it is a pleasure to see your opinions and **vote **about this code. If you have any questions, please do not hesitate to ask me here.