Get Even More Visitors To Your Blog, Upgrade To A Business Listing >>

Compute Gradient Descent of a Multivariate Linear Regression Model in R

A Multivariate Linear Regression Model is a Linear approach for illustrating a relationship between a dependent variable (say Y) and multiple independent variables or features(say X1, X2, X3 etc.). 
I had started with a simple example of univariate linear Regression Model where I was trying to predict the price of the house (Y) based on the area of the property (X). Mathematically, a univariate linear regression model is represented as below:
Y = h(𝛉) = 𝛉0 + 𝛉1X
Y = output variable/target variable(SalePrice of the house)
X = input variable(LotArea of the house)
𝛉0= the Intercept
𝛉1= the slope
Let's extend the linear model for multiple features. Thus, a multivariate linear regression model can be represented as below:
Y = h(𝛉) = 𝛉0 + 𝛉1X1 + 𝛉2X2 + 𝛉3X3 + 𝛉4X4... 𝛉nXn
where n = total number of features
x1, x2 ...xare the different features 
Y is the value to be predicted.
For simplifying the notations, let's consider we have another feature/variable Xand this X= 1 for all the values in the dataset. Thus the above equation can be written as below since X= 1:
Y = h(𝛉) = 𝛉0X + 𝛉1X1 + 𝛉2X2 + 𝛉3X3 + 𝛉4X4... 𝛉nXn
Thus, all the features can be represented by a n+1 dimensional Feature Vector X like the below one:

And all the parameters can be represented by a n+1 dimensional Parameter Vector 𝚹:

Now, we can use the concept of Transpose of a Vector and thus the Transpose of the above Vector 𝚹 is as below:
θ=[θ0 + θ1 + θ3 +...+θn]
Finally, a Multivariate Linear Regression Model can be expressed like:
Y = h(𝛉) =θTX
where X is the Feature Vector and θis the Transpose of the parameter vector.
As we know, in a linear regression model, we need to find out the line that fits best with our current data set. To get the best fit for this line, we need to choose the best values for 𝛉1 and 𝛉2, 𝛉3, 𝛉4 ... 𝛉n and so on. We can measure the accuracy of our prediction by using a cost function J(𝛉1,𝛉2,𝛉3...𝛉n). 
J(θ01...θn) = 1/2m∑ (hθ(x(i)) - y(i))2
01...θn) is a n+1 dimensional Parameter Vector
m = Total Number of examples in the DataSet.
hθ(x(i)) = predicted value for the ith data and can be represented as ŷ
So, the above Cost Function can be represented simply as below:
J(𝚹) = 1/2m∑ (ŷ(i) - y(i))2
where 𝚹 = a n+1 dimension vector for the parameter values

Similar to the Gradient Descent for a Univariate Linear Regression Model, the Gradient Descent for a Multivariate Linear Regression Model can be represented by the below equation:

repeat until convergence
{
θj = θj  - α * 1/m∑ (hθ(x(i)) - y(i)). x(i) 
where j = 0,1,2...n
}
Let's discuss with an example. Even for this case, I will use the dataset example of Machine Learning course of Andrew Ng. We will implement a linear regression model with multiple variables to predict the prices of houses. 
As earlier, we will divide the whole process into below steps:
  1. Load Data
  2. Feature Scaling
  3. Create a Cost Function
  4. Create a Gradient  Descent Function and Plot the Gradient Descent Results
  5. Check the results
Let's start with the exercise!
1. Load Data: 

For this exercise, the Test Data is provided in a Text file with 3 columns and it contains a training set of housing prices in Portland, Oregon. 
Let's take a look at the data.
There are 47 records with 3 variables. The first variable is the size of the house (in square feet), the second one is the number of bedrooms, and the third variable is the price of the house.
2. Feature Scaling
Our ultimate aim is to find out the accurate value of θj. Now, θj can be derived by using the concept of Gradient Descent.Gradient Descent may take some time to converge based on the dataset. One way of speeding up the process is "Feature Scaling". Thus, we can speed up gradient descent by having each of our input values in roughly the same range.This is because θ will descend quickly on small ranges and slowly on large ranges. Ideally, the parameters should range between:
−1 ≤ x(i) ≤ 1 or
−0.5 ≤ x(i) ≤ 0.5
By looking at the values of our current dataset, the house sizes are about 1000 times the number of bedrooms. That's why we will perform feature scaling and thus make gradient descent converge much more quickly. For Scaling, we will use the below Steps:
• Subtract the mean value of each feature from the dataset. 
• After subtracting the mean, additionally, scale (divide) the feature values by their respective “standard deviations.”  

Thus, the formula for this Feature scaling is as below:

:=
xiμisi


Where μis the average of all the values for a feature (i) and si is the Standard Deviation. 

First, let's create a function for feature scaling using the above equation.

Then, we will call the above-created function to scale our first two variables.
Let's take a look at the scaled data:

Although the values are not in the range of -1 and +1, the data is scaled to some extent.


3. Create a Cost Function:

We will create a Cost Function and will check later in Gradient Descent function if it is converging.


4. Create a Gradient Descent Function for multiple features:

This is the step where we will create a Gradient Descent Function to get optimum values for 𝛉.
We will call the cost function created in the above step and will check if the Cost 
is decreasing as we are reaching the optimum parameter values.
Let's create a Gradient Descent Function in R for multiple features. 

Now, we will set alpha = 0.001


This post first appeared on What The Data Says, please read the originial post: here

Share the post

Compute Gradient Descent of a Multivariate Linear Regression Model in R

×

Subscribe to What The Data Says

Get updates delivered right to your inbox!

Thank you for your subscription

×