Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Mathematical Representation of a Logistic Regression Model

Logistic Regression Model is a popular Machine Learning Algorithm used for Classification problems. Thus, Logistic Regression can be used to predict the risk of developing a disease based on some symptoms of a patient. Or, an email can be classified as SPAM or NOT SPAM by Logistic Regression Algorithm. It can also be used in predicting the likelihood of a Home Owner in defaulting the Mortgage.

In all these examples, we are predicting an outcome "y" which has two possible values {0,1}. 

Thus, y ∈ {0,1}, 
where 0: Negative Class
             1: Positive Class
The above Binary Classification examples can be represented optionally as below:
0 (Negative)
1 (Positive)
Email Not Spam
Email is Spam
The HomeOwner will NOT default on Mortgage
The HomeOwner will default on Mortgage
The Patient will NOT develop the disease
The Patient will develop the disease

For the Binary Classification problems, the hypothesis has to satisfy the below property:
0 ≤ hθ(x) ≤1
Now, for Logistic Regression Model, the Hypothesis can be represented as below:
hθ(x) = g(θTx)
where g(z) = 1/(1 + e-z), the Sigmoid Function or Logistic Function
Finally,  Replacing z with the above value,

Now, Let's suppose we have a Training Set with m examples. We have an (n+1) features Vector and the outcome y has two possible values {0,1}. Our objective is to choose an optimal value for 𝛉 to get a minimum cost. Now, a vectorized representation of the cost function for logistic regression model is as below:
J(θ) = 1/m * (-yT * log(h) - (1-y)* log(1-h))
where h =1/(1 + e-z)

Well honestly, I do not understand yet how this Cost Function is calculated for Logistic Regression Model. But apparently, this cost function can be derived from Statistics using the "Principle of Maximum Likelihood Estimation" (MLE).

Anyway, it is enough of formulae as of now. Let's try to solve the Logistic Regression problem in R from the Andrew Ng course for Machine Learning.

So, here goes the second exercise on Logistic Regression in R. Our objective is to build a logistic regression model to predict whether a student gets admitted into a university based on their results on two exams. We will divide this exercise in the below steps:

  1. Load Data
  2. Plot the Data
  3. Create a Sigmoid/Logistic Function
  4. Calculate the Cost
  5. Create a Gradient  Descent Function
Let's start with the exercise!

1. Load Data: 

For this exercise, the Test Data is provided in a Text file with 3 columns and 100 records. Variables V1 and V2 refer to the scores on 2 Exams and Variable V3 refers to the Admission decision.
Let's take a look at the Data:

2. Plot the data:
It is always advisable to visualize the data before starting to work on a particular problem. So, let's first take a look at the data points:

Below is the plot:


3. Create Sigmoid/Logistic Function:
Let's create the Logistic Function in R so that it can be used in the following step to calculate the Cost. 

For large positive values of z, the "Sigmoidcalc" Function should be close to 1, while for large negative values, the "Sigmoidcalc" should be close to 0. Evaluating Sigmoidcalc(0) should give us exactly 0.5. Let's test the Function we just created.

4. Calculate the Cost:
Now, we will calculate the Cost Function J(θ) with our current Dataset. Let's create the function first in R.
Now, we will test our "ComputeCost" Function. As per our current Dataset, "y" has values either 0 or 1. Our objective is to minimize J(θ). 
Thus, the Cost is calculated about 0.693 using the initial parameters of theta(θ) i.e. all zeroes.

5. Calculate Gradient Descent: Next, we will create a Gradient Descent Function to minimize the value of the cost function J(𝛉). We will start with some value of 𝛉 and keep on changing the values until we get the Minimum value of J(𝛉) i.e. best fit for the line that passes through the data points.General Form of Gradient Descent equation is as below:

Repeat{θj:=θjαmi=1m(hθ(x(i))y(i))x(i)j


This post first appeared on What The Data Says, please read the originial post: here

Share the post

Mathematical Representation of a Logistic Regression Model

×

Subscribe to What The Data Says

Get updates delivered right to your inbox!

Thank you for your subscription

×