Click here to Skip to main content
15,031,937 members
Articles / Artificial Intelligence / Machine Learning
Article
Posted 13 May 2019

Stats

4.3K views
12 bookmarked

Step-by-Step Guide to Implement Machine Learning IV - Logistic Regression

Rate me:
Please Sign up or sign in to vote.
4.88/5 (6 votes)
13 May 2019CPOL2 min read
Easy to implement machine learning

This article is an entry in our Machine Learning and Artificial Intelligence Challenge. Articles in this sub-section are not required to be full articles so care should be taken when voting.

Introduction

Logisitic regression is a classical method in statistical learning, which calculates the conditional probability P(Y|X) and uses the label of the larger one as the prediction. Specifically, the binomial logistic regression model is:

P\left(Y=1|x\right)=\frac{exp\left(w\cdot x+b\right)}{ 1 + exp\left(w\cdot x+b\right)}

P\left(Y=0|x\right)=\frac{1}{ 1 + exp\left(w\cdot x+b\right)}

where w and b are weight and bias, respectively. For convenience, expend weight vector and bias vector, namely,

\theta = \left(w^{(1)},w^{(1)},...,w^{(n)},b\right)\\ x =  \left(x^{(1)},x^{(1)},...,x^{(n)},1\right)\\

Then, the binomial logistic regression model is:

P\left(Y=1|x\right)=\frac{exp\left(\theta^{T} x\right)}{ 1 + exp\left(\theta^{T} x\right)}

P\left(Y=0|x\right)=\frac{1}{ 1 + exp\left(\theta^{T} x\right)}

Logistic Regression Model

Logistic Regression model consists of parameters estimation, optimization algorithm and classify.

Parameters Estimation

In Step-by-Step Guide To Implement Machine Learning III - Naive Bayes, we use the Maximum likelihood function to estimate the parameters in the Baysian model. Similarly, we use Maximum likelihood function to estimate the parameters in Logistic Regression Model. Denote

P\left(Y=1|x\right)=\pi_{\theta}\left(x\right)

P\left(Y=0|x\right)=1-\pi_{\theta}\left(x\right)

where:

\pi_{\theta}\left(x\right) =g\left(\theta^{T}x\right)=\frac{1}{1+e^{-\theta^{T}x}}

g(x) is also called sigmoid function. The likehood function is:

\prod_{i=1}^{N}\left[\pi\left(x^{(i)}\right)\right]^{y^{(i)}}\left[1-\pi\left(x^{(i)}\right)\right]^{1-y^{(i)}}

For convenience, we take the logarithm of the likehood function, namely:

L\left(\theta\right)=\sum_{i=1}^{N}\left[ y^{(i)}\log\pi_{\theta}\left(x^{(i)}\right)+\left(1-y^{(i)}\right)\log\left(1-\pi_{\theta}\left(x^{(i)}\right)\right)\right]\\

Then, the problem is transformed into calculating the max of the likehood function.

Optimization Algorithm

Because, we cannot get an analytic solutions to the derivative of likehood function. To get the max of likehood function, we apply the gradient ascent method, namely:

\theta:=\theta+\alpha \nabla_{\theta}L\left(\theta\right)

calculate the derivative of likelihood function:

\begin{align*}  \frac{\partial }{\partial \theta_{j}}L\left(\theta\right) & = \left(y{\frac{1}{g\left(\theta^Tx\right)}}-\left(1-y\right)\frac{1}{1-g\left(\theta^Tx\right)}\right)\frac{\partial}{\partial\theta_{j}}g\left(\theta^Tx\right)\\ &=\left(y{\frac{1}{g\left(\theta^Tx\right)}}-\left(1-y\right)\frac{1}{1-g\left(\theta^Tx\right)}\right)g\left(\theta^Tx\right)\left(1-g\left(\theta^{T}x\right)\right)\frac{\partial}{\partial\theta_{j}}\theta^Tx\\ &=\left(y\left(1-g\left(\theta^{T}x\right)\right)-\left(1-y\right)g\left(\theta^{T}x\right)\right)x_{j}\\ &=\left(y-\pi_{\theta}\left(x\right)\right)x_{j}  \end{align*}

Let the derivative equal to zero, we can get:

\theta := \theta+\alpha\sum_{i=1}^{m}\left(y^{(i)}-\pi_{\theta}\left(x^{(i)}\right)\right)x^{(i)}_{j}

Thus, we can get the optimized parameter through the above equation. The code of gradient ascent method is shown below:

Python
if method == "GA":
weights = np.random.normal(0, 1, [feature_dim, 1])
for i in range(iterations):
    pred = self.sigmoid(np.dot(train_data, weights))
    errors = train_label - pred
    # update the weights
    weights = weights + alpha * np.dot(train_data.T, errors)
self.weights = weights
return self

Classify

In logistics regression model, sigmoid function is applied to calculate the probability, which is expressed as:

sigmoid\left(x\right)=\frac{1}{1+e^{-x}}

When the result is larger than 0.5, the sample belongs to class 1, else it belongs to class 0.

Python
def sigmoid(self, x, derivative=False):
    output = 1/(1 + np.exp(-x))
    if derivative:
       output = output * (1 - output)
    return output

Conclusion and Analysis

To get the parameters of the logistic regression model, we can also minimize the loss function. Finally, let's compare our logistics regression with the Sklearn's and the detection performance is displayed below:

Image 15

The detection performance of both is similar.

The related code and dataset in this article can be found in MachineLearning.

History

  • 13th May, 2019: Initial version

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Ryukkkk
Engineer
Germany Germany
Ryuk is interested in Machine Learning/Signal Processing/VoIP.

Comments and Discussions

 
-- There are no messages in this forum --