## Series Introduction

This is the 2nd article of creating .NET library, here is link for Part 1:

Build Simple AI .NET Library - Part 1 - Basics First

My objective is to create a simple AI library that covers couple of advanced AI topics such as Genetic algorithms, ANN, Fuzzy logics and other evolutionary algorithms. The only challenge to complete this series would be having enough time working on code and articles.

Having the code itself might not be the main target however, understanding these algorithms is. wish it will be useful to someone someday.

Series will be published in couple of parts, am not sure how many yet. Anyways, each part will focus on single key topic trying to cover for good.

Please, feel free to comment and ask for any clarifications or hopefully suggest better approaches.

## Article Introduction - Part 2 "Machine Learning Introduction"

I thought that it would be more beneficial to introduce separate article for basics definitions used in machine learning as we will use such definitions later in following articles.

Most of this article will be dedicated to the concept of ML and different terms, there are tons of online resources that can be referenced for further information.

## What is AI really means

Before discussion ML, let's first define what AI really means and to consider AI definition as starting point for way forward,

There are many definitions for AI based on application or problem being resolved, however the simplest definition to fit all could be "AI is the ability of creating machines that could take decisions without being explicitly programmed for" in other words "Building machines (or applications) that have some kind of self intelligence in form of taking decisions"

Here is Wiki article about that

The need for AI started when human started to face very complicated problems that are very hard to be programmed in advance. For example, consider self driven vehicles. This is very complicated task if only approach is to program each and every scenario that vehicle may face. Practically, this would be almost impossible.

Hence, clearly we need more special approaches (algorithms) that could impose some kind of intelligence to vehicle main processor.

Consider raising a baby, of course at beginning you would explicitly guide and provide instructions however at some stage, due to human intelligence; baby will start to learn from experience whether good or bad experience. so learning is very essential component to intelligence. and hence, ML term is always associated with AI

## What is Machine Learning means

Simply, ML is AI field that study one problem "How machines or applications can auto-learn from experience?"

ML is set of techniques, algorithms and/or tools used to form learning process and eventually AI portion of any machine.

Back to baby analogy, one way to teach a baby is giving examples or put him into an experience. same in ML, that is called "`Supervised Learning`

" which means that we train machine first by giving training sets.

Each training set shall be composed of same set of inputs along with correct answer (called label). utilizing different algorithms, machine can iterate through all training sets and start to learn and build AI then; shall be ready to predict (provide estimated output) for any similar number of inputs or take a decision.

But this is not the only type of ML, there are other 2 common types

- `Unsupervised learning`

- in which there is no training set available, this is used for specific type of problems that will mention later

- `Reinforced learning `

- is another type of ML, there is no training set however machine will receive feedback based on accomplished target. for example, for AI application playing chess, feedback could be wining or losing the game.

Let's go through the details of each type and how it could be used

## Supervised Learning

Again, this is when we do have available training set that we use to train our machine (this is where term supervised is coming from) maybe this is the time to have real case example such as classical example of Building an application to estimate price of an apartment based on area.

This is very classical example used in many AI references to explain the concept and i will use the same.

So, to build such application will need a set of combinations representing flat area against price. of course, at real scenario price will have multiple factors, as location, number of rooms however for simplicity will consider other factors as constants and have no impact on price.

This is a supervised learning because we start with available data, and our target would be creating an application to learn form this data and even get more intelligent by having further experience (getting further combinations of area vs price)

eventually, this application would expect inputs (flat area, single input) and predict output (price) which is continues number, this is kind of problems that is called "`Regression`

" in AI

You may think of regression as optimization function or finding best-fit function mapping inputs to output. in algebra there is a term called "Interpolation" which is mainly the same concept, finding a best-fit function for given set of inputs and outputs.

Of course, this mapping could be in linear or no-linear forms (based on complicity of problem). Linear regression is simplest and it represents mapping function as linear or straight line. For AI terms, mapping function is called hypothesis or h function in form of `h(x) = a + b * x `

where:

`X`

is the input(s)

`a & b`

are slope of the line

` h(x)`

is the hypothesis function of inputs, or simply estimated output

let's name the correct answer of any given training set as `y `

then error would be simply the difference between estimated answer (hypothesis) and correct answer `e = h(x) - y`

Logically, this error shall be minimized as much as possible to ensure fulfilling best-fit part. Error minimization is a whole study area with many algorithms that i will continue discussion later at this article.

Now back to supervised learning, we have seen that regression is one application of supervised learning but it i s not the only one. Remember, regression works with continuous (or real) outputs. what about discrete outputs cases, for example if we do have training set for couple of inputs and output is simply grouping the inputs to pre-defined groups or namely `Classification `

which the 2nd major utilization of supervised learning.

## UnSupervised Learning

On the other hand, unsupervised learning does not have pre-given training set, just a set of inputs without labels (remember label is the correct answer) this kind of ML is used for specific set of problems where we can not really label inputs or pre-program it. Let's have an example from social media. for example recommendation list from YouTube or recommended friends from Facebook.

Unsupervised ML could be behind this kind of AI, where for each user; watch history or friend list are reviewed and sorted. this is called `Clustering `

problems in AI. You can imagine that there is no way to program these recommendations for each user in advance.

## Reinforced learning

For the third type, it is some kind of special supervised version where machine learns from the outcome or simply its own experience. It is learning from the feedback of the output (as winning or loosing a game or reach a destination).

As an example, consider building application to play chess. from one end, there is no way to program all possible moves at any given time. and on the other end, if we managed to build the application to play each game and learn from winning or loosing the game then this is reinforced learning type.

## What is Next

Above are mainly the most fundamentals about ML and based on that, tons of algorithms and techniques are built to establish each type and solve different problems accordingly.

It is important to understand it as it will help understanding any further advanced topics related to AI.

The next way forward would be exploring different algorithms and approaches hopefully to understand AI further.

So this would be time to build our first AI algorithm with code.

## Linear Regression Example

Let's create one application to solve simple linear regression problem. regardless of the technical background of the application or exact problem we are trying to resolve, the algorithm is always the same. hence, will not worry much about what are this data but assume we got it somehow. Here is the data in hand

Where `X`

is the input (could be any real variable) and `Y`

is the correct target or label. Plotting this data set will result in

The objective of application is to predict let's say output incase `x`

is 70 based on given training set.

Based on given data set, looks like linear regression would be acceptable solution for this problem. hence application will mainly resolve the hypothesis function `h(x) = a + b * x`

in other words to find the values of `a`

and `b`

that yields to best fit line.

Let's build the algorithm for that, first will start with any random values of `a`

and `b`

Now, we need to minimize error to the lowest possible value but first, let's get rid of -ve errors, errors can be in both directions positive or negative. one way to do so is by squaring the summation of error.

Some errors are higher than others based on the difference between correct answer and h(x) to ensure that most of errors are targeted, let's take the average or mean of squared error which equals squared error/size of training set.

As per Statistics terms, this method is called "`Mean Squared Error`

" or `MSE`

and it tells you how close a regression line is to training set.

Some references, may refer to MSE as cost function which is broader definition

Where `m`

is the size of training set (how many training combinations)

As this is 2nd order polynomial equation (assuming one variable) so in 2D coordinates this is the graph

From above graph, clearly this function has only one min (we can not tell by now where this min is). In other words, above graph tells us that if we started with any random values for `a `

and `b`

we will have an error anywhere on above graph and from there we need to move to min

At below graph, Blue point represents min error value, for any random `a`

& `b`

values we may end on right side (as red circle) or left side (as green circle)

Question is How to determine which side we are on the curve? right or left?

One way to determine direction is to draw straight line that is touching with curve at starting point and then calculate the slope of this line.

For example red line in above graph has +ve slope (as moving to right, value on line increases) however green line has -ve slope

Let's revisit our pseudo algorithm

Above is very common and handy algorithm that is called **Gradient Descent** however above version is valid mainly for linear regression, for nonlinear functions other considerations shall be taken.

So, how we can calculate slope of straight line?

Slope of line touching function (curve) at certain point is exactly the definition of derivative of function. In our case, as we are having 2 variables (`a`

& `b`

) this is called partial derivative.

## Gradient Descent

So far we have the following:

Let's start to resolve `a`

:

To make long story short, here is final partial derivative for a

and here is for b

Then

To further control the step taken, let's add a small number representing the step size to take as fraction of slope. this is called "**Learning Rate**". Having improper value for learning rate can impact final behavior for our algorithm and we shall examine different values in our code

Anyways, let's mark it as `r`

similarly for `b`

Above last 2 equations represents the final conclusion from applying Gradient Descent to linear regression.

## Using the code

To demonstrate above, have created simple program

Main function is Train, it accepts two 1D matrices representing training set

Public Sub Train(_Inputs As Matrix1D, _Labels As Matrix1D)
Dim m As Integer = _Inputs.Size
Dim Err As Matrix1D
Dim Counter As Integer = 0
Dim Best_a, Best_b As Single
If _Inputs.Size <> _Labels.Size Then
Throw New Exception("Both Inputs and Labels Matrices sizes shall match.")
End If
Randomize()
Do While Counter < 100
Dim h_Matrix As New Matrix1D(m)
Err = New Matrix1D(m)
For I As Integer = 0 To m - 1
h_Matrix.SetValue(I, Hypothesis(_Inputs.GetValue(I)))
Next
Err = h_Matrix.Sub(_Labels)
If CalcCostFunction(Err) < min_MSE OrElse Counter = 0 Then
min_MSE = CalcCostFunction(Err)
Best_a = a
Best_b = b
End If
a = a - r * (1 / m) * Err.Sum
Err = Err.Product(_Inputs)
b = b - r * (1 / m) * Err.Sum
Counter += 1
Loop
a = Best_a
b = Best_b
End Sub

It implements pseudo algorithm of gradient descent.

There are different ways to terminate the iteration, i have selected one technique by setting max counter of 100 while keep recording minimum MSE in each iteration along with `Best_a`

and `Best_b`

variables.

Lastly, need to mention that sample software has reference to **CommonLib** which is added at the beginning of the article (Attached version is most updated one)

## Recap

Have defined 3 types of ML:

`Supervised learning `

- with available training set, mainly used for regression and classification problems `Unsupervised learning `

- no training set is available and typically used for clustering problems `Reinforced learning`

- where no training set but machine learns by outcome or feedback of every experience

Then, we have detailed one key Algorithm of Gradient Descent typically for linear regression giving 1 sample software.

Am not sure if above was clear enough, please let me know.

## Next Article

Mostly, will be about the 2nd use of supervised learning which is classification, typically using Perceptron