machine learning interview prep Flashcards
gradient boosting and random forest? And what are the advantages and disadvantages of each when compared to each other?
Random forest is less prone to overfitting compared to gradient boosting. It has faster training as trees are created in parallel and independent of each other.
Gradient boosting can be more accurate than random forests because we train them to minimize the previous tree’s error.
It can also capture complex patterns in the data.
Briefly explain the K-Means clustering and how can we find the best value of K.
In K means each piece of data has its own vector representation which all fall into different parts of a grid.
K means assigns pieces of data to cluster based on their euclidean distance.
A method used to find the best value of k is the elbow, method, meaning past a certain point increasing k leads to very little variation
What is dimensionality reduction? Can you discuss one method of it?
Demensiaonly reduction is the process of reducing the amount of features in your data set in order to lessen the computational load.
One way method to reduce dimensionality is to look for highly correlated features and try to represent them with fewer features.
What are L1 and L2 regularizations?
L1 is know as lasso
L2 is known as ridge
What are the differences between the two L1 and L2 ?
L1 or lasso…. essentially makes sure that irreverent features value are removed almost to the point of zero.
L2 or Ridge….. tells the model to reduce the importance of a certain feature via adding. weight to the feature, to prevent it from over impacting the model
What is the difference between overfitting and underfitting, and how can you avoid them?
Under fitting is when your model fails to capture the scope of the data and in turn fails to make accurate predictions
Over fitting is when your model is too familiar with the training data and works poorly with other data
overfitting and underfitting, and how can you avoid them?
You can avoid under fitting by giving the model better feature to work with, also you can reduce regularization prams and let the model familiarize themselves with the familiarize itself with the intricicatcies of