Regression

The relationship between two variables is modeled through Simple linear regression, wherein it can be precisely estimated how much Y will change when X changes by a specific amount. With regression, one tries to establish a linear relationship between the two chosen variables.


Y=b0+b1.X


Wherein:

Y: Target

b0:Constant or Intercept

b1:Slope

X: Feature Vector

b0 and b1both appear in R output as coefficients.


Different Terms in Regression:

  • Dependent Variable: What we want to predict or understand is called the dependent variable. It is also called target variable.

  • Independent Variable: The factors which affect the dependent variables or which are used to predict the values of the dependent variables are called independent variable, also called as a predictor.

  • Outliers: Outlier is an observation which contains either very low value or very high value in comparison to other observed values. An outlier may hamper the result, so it should be avoided.

  • Multicollinearity: If the independent variables are highly correlated with each other than other variables, then such condition is called Multicollinearity. It should not be present in the dataset, because it creates problem while ranking the most affecting variable.

  • Underfitting and Overfitting: If our algorithm works well with the training dataset but not well with test dataset, then such problem is called Overfitting. And if our algorithm does not perform well even with training dataset, then such problem is called underfitting.


Important of Regression

  • Regression estimates the relationship between the target and the independent variable.

  • It is used to find the trends in data.

  • It helps to predict real/continuous values.

  • By performing the regression, we can confidently determine the most important factor, the least important factor, and how each factor is affecting the other factors.

Type of Regression

  1. Simple Linear Regression

  2. Multiple Linear Regression