02 End-End ML Project Flashcards
what is correlation
- it measures the degree of association or relation between 2 or more variables.
- it helps to determine the pattern between 2 variables.
3.value ranges from -1 to +1.
what does correlation of -1 mean
it indicates that if 1 variable increases another variable decreases and the relation is strong.
what does correlation of +1 mean
it indicates that if 1 variables increases other variable also increases and the relation is strong.
what does correlation 0 mean
the relation between both the variables is low.
data processing required before calculating correlation
removing of outliers.
formula of correlation
cov(x, y)/sdx*sdy
what is covariance
it tells us the direction in which both the variables chenge.
+ve covariance
both the variables are moving in the same direction
-ve covariance
if 1 variable is increasing other variable is decreasing
Formula of covariance
sum(Xi-Xmean)-(Yi-Ymean)/(n-1)
types of data transformation
- min-max scaler AKA Standardisation
- normalization.
what is min-max scaler
- the transformed range between 0 - 1
- it is highly affected by outliers
formula of min-max scaler
(x-xmin)/(xmax-xmin)
what is standardization
- the mean of the transformed value is 0 and the variance will be 1.
What are all Regression Model Metrics
- Mean Sq Error (MSE)
- Root Mean Sq Error (RMSE)
- R-Square (R^2)
- Adjusted R-Square
What is MSE?
Mean of the difference between actual and predicted value.
what is RMSE?
it the square root of the mean of square of difference between actual and predicted.
if RMSE is 90 than the gap between actual and predicted is 90.
What is R^2?
with RMSE it is difficult to understand which model is better for different problems hence we use R^2.
formula = 1-(RSS/TSS) where RSS is Residual Sum of Sq and TSS is Total Sum of Sq.
what is adjusted R^2?
R^2 does not change when new variables are added and hence we use
adjusted R^2 to understand the performance of model when new features are added.