04 Training Models Flashcards
In an equation y = b1x1 + b2x2 +c what does ML calculate
In an equation x1, x2 values are there in the data(columns) and b1 and b2 are calculated by the model
name order of equations in linear regression
Normal Equation followed by linear regression model
Why is normal equation used
to calculate cost function
What is the main component in linear regression model
theta is the main component. we need to find the value of theta where RMSE value is less.
Drawback of normal equation
It is very slow on large dataset with many features.
Methods used to run Linear regression model with large dataset
- Gradient Descent
- Batch Gradient Descent
- Stochastic Gradient Descent
- Mini - Batch Gradient Descent
What is gradient descent
It is an algorithm to find optimal solution to a complex problems.
it measures local gradient of error function to the theta.
what is the relation of learning rate and theta
if the learning rate is too low than it will take more time to reach to the theta value and if the learning rate is too high than it will cross the optimal theta value.
Difference between Gradient Descent and Batch Gradient Descent method
in the Gradient Descent the change in the cost function and parameter is calculated at each step and in the batch Gradient Descent all the change is calculated using the entire training data and in single step.
What is Stochastic Gradient Descent Method
it uses an instance (randomly selected) from the data to calculate the optimal theta value.
Advantage of using Stochastic Gradient Descent Method
1.Very quick on large dataset.
2. effective when the data has multiple local minima’s.
what each iteration in linear regression called
epoch
what is learning schedule
function to determine learning rate.
What is mini batch stochastic Gradient Descent
it is the mixture of stochastic and batch Gradient Descent
Do we require scaling for any of the Gradient Descent method
yes
Performance of Gradient Descent models on large datasets
Normal eq - fast;
BGD - Slow;
SGD - Fast
Mini-batch GD - Fast
Performance of Gradient Descent models with many features
Normal eq - slow;
BGD - Fast;
SGD - Fast
Mini-batch GD - Fast
which logistic regression model have 0 hyper parameter
normal equation(LinearRegression)
How can a normal equation be used to solve a polynomial equation?
the feature with x^2 will be added as a new feature.
what is overfitting
the model is performing well training data and is not performing well on the validation set.
what is underfitting?
the model is not performing well on both training and validation set.
how to deal with an underfitting model?
we need to add more features or choose a complex model.
how to deal with an overfitting model?
we need to add more training date.
name 3 types of model errors
1.Bias;
2.Variance;
3.Irreducible error
what is Bias model error and how can we recognize it?
It is due to wrong assumptions, i.e. we think it is normal equation while it is quadratic equation. In this case the model underperformance.
what is Variance model error and how can we recognize it?
It is because the model will be sensitive to even slight change.
In this case the model will overfit.
What is regularization?
It is constraining the model.
it is a way to control overfitting in a model.
It is achieved by applying weights.
what are different types of regularization?
1.Ridge Regression
2.Lasso Regression
3.Elastic Net
What is Ridge Regression?
a regularization term is added to the algorithm forcing the algorithm to fit the data and keep the weights as small as possible.
Is scaling necessary for Ridge Regression?
Yes standardization is necessary.
which hyperparameter in regularization model controls the regularization?
alpha
which method to use for ridge regression?
Ridge()
which method to use for ridge regression in sgd?
SGDRegressor(penalty = “l2”)
what is full form of Lasso Regression
Least Absolute Shrinkage and Selection Operator.
how does lasso regression work?
it adds weights to feature, it adds weight 0 for the least important feature.
how is ridge regression different from Lasso regression
ridge regression it adds weight & Lasso regression it adds weight 0 to the least important features.
which method to use for Lasso regression?
Lasso()
which method to use for Lasso regression in sgd?
SGDRegressor(penalty = “l1”)
What is Elastic Net?
It is a middle ground between Lasso and Ridge regression
which hyperparameter is used to control the ratio of lasso and ridge regression
l1_ratio - where if it is close to 0 it will be ridge regression and close to 1 will be lasso regression.
Which model among linear regression, ridge & lasso is better
ridge & lasso are better as they have regularization.
if I have a data with high correlated dataset which model to use between Lasso and Ridge.
Ridge is better to start with high correlated dataset.
What is Early Stopping?
Another way to regularize a model is to use early stopping. where the model stops training as soon as its validation error reaches its minimum.
Can Logistic Regression Model be used for classification and regression?
Yes
What is Logistic Regression?
Like Linear Regression it calculates probability for each instance and based on the probability it classifies into 0’s or 1’s. on this Linear Regression a sigmoid function which gives result in 0’s and 1’s.
What is Decision Boundaries?
it is a boundary between class 0 and 1 which will allow to differentiate between both classes.