Multiple Linear Regression

  • The multiple linear regression represents the relationship between two or more independent variables and a response variable.

  • If the prediction is not accurate with one variable then multiple linear regression helps to create a good model.

Equation for n variables:

Y=β01*X12*X2+...+βn* Xn

Multicollinearity :

This allows to have a related predictor variables in the input dataset. In other words: a model which has been built using many independent variables but some of these variables are interrelated, which allows redudent presence of variables in the model.

For Example :

1) y=6x1+9x2

2) y= 15x1

3) y=15x2


Let us assume that predictor x1 and x2 values are same then in such case how dependent variable is getting affected.

which effect is coming for x1 and which one is coming from x2.

In such cases multicollinearity is an issue because we are not able to seperate the effect of predictors, we don't know from where the variations are coming from.

This is a problem becuase :

  • It affects the interpretation i.e. Does dependent variable (y) change, when all others are kept constant , apply ?

  • It affect inference : becuase cofficients change widely , signs can invert and due to this P-Value is not relaible. ]

Multicollinearity does not affect

  • the prediction , precision of the prediction

  • goodness-of-fit statstics.

Multicollinearity is a severe issue if we are trying to interpret then can be good to go.

Detecting Multicollinearity in the predictors

Detecting multicollinearity is same as detecting associations in the predictors.


1) Scattor plot to visual ispect.

2) correlations to quantify the linear association ( Visualise using heatmap)

Above methods are pairwise correlations which may not be enough. Sometimes a variable is associated 4 more than two variables have same association.

Approach: building model to explain the predictor using other predictors.

Variance Inflation Factor

Build the models to explain the predictors using other predictors and find the R-Square values. After that find the VIF for a variables.

VIFi= 1/(1-Ri^2)

VIF> 10: Definitely high VIF value and the variable should be eliminated.

VIF> 5: Can be okay, but it is worth inspecting.

VIF< 5: Good VIF value. No need to eliminate this variable.

  • VIF should be done for all variables.

  • VIF can be change if we drop one of the other predictor variables.

Dealing with Multicollinearity

  • Deleting Variables

  1. Dropping the variables which are highly correlated with other variables.

  2. Selecting interpretable variable which are adding values in the prediction .

  • Creating new variable

    1. add new interaction feature by dropping original feature.

    2. varibale Trasnforamtion

Categorical variables Handling in regression

Many times in dataset we used to have non-numeric values such as Gender(male, female). So these variables can not be used directly in the model.

We have to convert these categorical variable in the numerical formate.

  • Dummy Variable creation : Creating n-1 variables for n category , which are level indicators.

Feature Scaling

Why feature scaling is important :

  1. Easy to interpret. [ Different range of values create confusion and not give uniform view]. One scale value easy to compare the cofficient one to other.

  2. In Background gradient descent methods is working in the conversion so feature scaling provide a faster approch.

It is important to note that scaling just affects the coefficients and none of the other parameters like t-statistic, F-statistic, p-values, R-squared, etc.

Methods of Feature Scaling :

1) standardisation : this methods brings all of the data into standard normal distribution with mean Zero and standard deviation One.
Standardisation: X= x−mean(x)/sd(x)


2) MinMax Scaling : brings all of the data in the range of 0 and 1.

MinMaxScaling: X= x−min(x)/max(x)−min(x)

Model Evaluation and Comparative Study

  1. F-statistic

  2. R-Square

We build the model and wanted to know how good model is.

Selecting the best model

  • Bais vs Variance Trade -off

Model Compararision

penalize the models using higher number of predictors and find the adjusted R^2

Adusted R-Square take in the consideration , the number of variables as well as number of records in the dataset.

Few sitations

1) low number of records and high number of variable is not a good situations for model

2) high number records and low number of variable gives better performance to model.

Adjusted R-Square help to compare the model.

Adjust R-Sqaure = 1- [(1-R-Square)(N-1)/(N-p-1)]

AIC : Lower values for AIC is better and higher Value is not good. Its popular for model comperation.

AIC =n*log(RSS/N)+2p

Feature Selection

Selecting a right predictors ( variables) for the model is an important steps.

  1. Bruteforce method : Trial of all possible combination

2^p models for p features.

  1. Manual Feature Elimination


  • Build the model

  • Dropping features having least supports in prediction ( P-Value)

  • Dropping redundant feature ( using Correlations, VIF)

  • Rebuid the model and reiterate.

  1. Automated Approach ( This helps when variables are in large numbers)

    • For automatic feature selection there should be rules/criteria to use.

      1. Top 'n' features : RFE( Recursive Feature Elimination)

      2. Forward/Backward/Stepwise selection : Based on AIC.

      3. Regularization(Lasso)

4. Balance Approach : combination of automated(coarse tuning) + manual(fine tuing) selection