ML-Interview-Questions

  1. What is EDA Process?

Ans: Data Understanding is a key step in CRISP-DM framework. In any end to end Data Science project, there is a need for proper understanding of the underlying data well before starting to work on the same. EDA or exploratory data analysis comes into picture here, where in you deep dive to the dataset in hand to find patterns/correlations/any sort of relationship within. These not only helps in better data understanding but also helps in the data cleaning part by identifying irrelevant/correlated features.

Step in EDA

  • Identification of important feature. As per the problem, identification of the variable is a first step.

  • Univariate analysis to understand each column in the data.

  • Bivariate analysis : how the variables are interrelated to each other. This help us identifying relationship between variables.

  • Data Visualisation

  • Handling missing values and outliers.

  • Transform orignal variables

  • Derive new variables.

Important link to understand the data

  1. https://medium.com/@raoufkeskes/missing-data-its-types-and-statistical-methods-to-deal-with-it-5cf8b71a443f

  2. https://towardsdatascience.com/ways-to-detect-and-remove-the-outliers-404d16608dba

  3. https://www.tylervigen.com/spurious-correlations

  4. https://medium.com/@seema.singh/why-correlation-does-not-imply-causation-5b99790df07e

  5. https://www.statisticssolutions.com/dissertation-resources/research-designs/establishing-cause-and-effect/

Data Un

2. Handling Missing Values

mean and median for numeric data and mode for categorical data.