February 22nd 2018

A Multivariate Linear Regression Model is a Linear approach for illustrating a relationship between a dependent variable (say Y) and multiple independent variables or features(say X1, X2, X3 etc.).

I had started with a simple example of univariate linear Regression Model where I was trying to predict the price of the house (Y) based on the area of the property (X). Mathematically, a univariate linear regression model is represented as below:

Y = h(𝛉) = 𝛉₀ + 𝛉1X

Y = output variable/target variable(SalePrice of the house)

X = input variable(LotArea of the house)
𝛉₀= the Intercept

𝛉₁= the slope

Let's extend the linear model for multiple features. Thus, a multivariate linear regression model can be represented as below:

Y = h(𝛉) = 𝛉₀ + 𝛉₁X₁ + 𝛉₂X₂ + 𝛉₃X₃ + 𝛉₄X₄... 𝛉_nX_n
where n = total number of features

x_1,x_{2 ...}x_nare the different features
Y is the value to be predicted.
For simplifying the notations, let's consider we have another feature/variable X₀and this X₀= 1 for all the values in the dataset. Thus the above equation can be written as below since X₀= 1:
Y = h(𝛉) = 𝛉₀X₀ + 𝛉₁X₁ + 𝛉₂X₂ + 𝛉₃X₃ + 𝛉₄X₄... 𝛉_nX_n
Thus, all the features can be represented by a n+1 dimensional Feature Vector X like the below one:

And all the parameters can be represented by a n+1 dimensional Parameter Vector 𝚹:

Now, we can use the concept of Transpose of a Vector and thus the Transpose of the above Vector 𝚹 is as below:
θ^T=[θ₀ + θ₁ + θ₂+θ₃ +...+θ_n]
Finally, a Multivariate Linear Regression Model can be expressed like:
Y = h(𝛉) =θ^TX
where X is the Feature Vector and θ^Tis the Transpose of the parameter vector.
As we know, in a linear regression model, we need to find out the line that fits best with our current data set. To get the best fit for this line, we need to choose the best values for 𝛉1 and 𝛉2, 𝛉3, 𝛉4 ... 𝛉n and so on. We can measure the accuracy of our prediction by using a cost function J(𝛉1,𝛉2,𝛉3...𝛉n).
J(θ₀,θ₁...θ_n) = 1/2m∑ (h_θ(x⁽ⁱ⁾) - y⁽ⁱ⁾)²
(θ₀,θ₁...θ_n) is a n+1 dimensional Parameter Vector
m = Total Number of examples in the DataSet.
h_θ(x⁽ⁱ⁾) = predicted value for the ith data and can be represented as ŷ

So, the above Cost Function can be represented simply as below:

J(𝚹) = 1/2m∑ (ŷ⁽ⁱ⁾ - y⁽ⁱ⁾)²
where 𝚹 = a n+1 dimension vector for the parameter values

Similar to the Gradient Descent for a Univariate Linear Regression Model, the Gradient Descent for a Multivariate Linear Regression Model can be represented by the below equation:
repeat until convergence
{
θj = θj - α * 1/m∑ (h_θ(x⁽ⁱ⁾) - y⁽ⁱ⁾). x⁽ⁱ⁾
where j = 0,1,2...n
}
Let's discuss with an example. Even for this case, I will use the dataset example of Machine Learning course of Andrew Ng. We will implement a linear regression model with multiple variables to predict the prices of houses.
As earlier, we will divide the whole process into below steps:

Load Data
Feature Scaling
Create a Cost Function
Create a Gradient Descent Function and Plot the Gradient Descent Results
Check the results

Let's start with the exercise!
1. Load Data:
For this exercise, the Test Data is provided in a Text file with 3 columns and it contains a training set of housing prices in Portland, Oregon.
Let's take a look at the data.

There are 47 records with 3 variables. The first variable is the size of the house (in square feet), the second one is the number of bedrooms, and the third variable is the price of the house.

2. Feature Scaling

Our ultimate aim is to find out the accurate value of θj. Now, θj can be derived by using the concept of Gradient Descent.Gradient Descent may take some time to converge based on the dataset. One way of speeding up the process is "Feature Scaling". Thus, we can speed up gradient descent by having each of our input values in roughly the same range.This is because θ will descend quickly on small ranges and slowly on large ranges. Ideally, the parameters should range between:

−1 ≤

x(i) ≤ 1 or

−0.5 ≤

x(i) ≤ 0.5

By looking at the values of our current dataset, the house sizes are about 1000 times the number of bedrooms. That's why we will perform feature scaling and thus make gradient descent converge much more quickly. For Scaling, we will use the below Steps:

• Subtract the mean value of each feature from the dataset.
• After subtracting the mean, additionally, scale (divide) the feature values by their respective “standard deviations.”

Thus, the formula for this Feature scaling is as below:

:=xi−μisi

Where μi is the average of all the values for a feature (i) and si is the Standard Deviation.

First, let's create a function for feature scaling using the above equation.
Then, we will call the above-created function to scale our first two variables.
Let's take a look at the scaled data:

Although the values are not in the range of -1 and +1, the data is scaled to some extent.

3. Create a Cost Function:
We will create a Cost Function and will check later in Gradient Descent function if it is converging.

4. Create a Gradient Descent Function for multiple features:
This is the step where we will create a Gradient Descent Function to get optimum values for 𝛉.
We will call the cost function created in the above step and will check if the Cost
is decreasing as we are reaching the optimum parameter values.
Let's create a Gradient Descent Function in R for multiple features.

Now, we will set alpha = 0.001

The Ultimate Guide to Cloud Gaming: D…
best projectors for home

This post first appeared on What The Data Says, please read the originial post: here

People also like

The Ultimate Guide to Cloud Gaming: Discover the Best Services

best projectors for home

Compute Gradient Descent of a Multivariate Linear Regression Model in R

Related Articles

Compute Gradient Descent of a Multivariate Linear Regression Model in R

Related Articles

Share the post

Subscribe to What The Data Says

Thank you for your subscription