ML Interview Prep Flashcards
What is bias?
Distance of predicted data points from actual targets.
Low bias: predicted data points are close to target (overfitting).
High bias: predicted data points are far from target.
What is variance?
The variability of a prediction for a given data point or a value which tells us spread of our data. High variance: pays a lot of attention to training data and doesn’t generalize on unseen data. Such models perform very well on training data but not on test data.
Other definition: Variance is the amount that the estimate of the target function will change if different training data was used.
Explain the Bias-Variance Tradeoff.
Predictive models have a tradeoff between bias (how well the model fits the data) and variance (how much the model changes based on changes in the inputs).
Simpler models are stable (low variance) but they don’t get close to the truth (high bias).
More complex models are more prone to being overfit (high variance) but they are expressive enough to get close to the truth (low bias).
The best model for a given problem usually lies somewhere in the middle.
What is the difference between Stochastic Gradient and Gradient Descent.
Gradient descent is an optimization algorithm that’s used when training a machine learning model. It’s based on a convex function and tweaks its parameters iteratively to minimize a given function to its local minimum.
GD: evaluate all training samples for each set of parameters.
SGD: evaluate 1 training sample for the set of parameters before updating them.
Explain the difference between supervised and unsupervised machine learning.
Supervised learning requires labeled data and uses a ground truth, meaning we have existing knowledge of our outputs and samples. Goal is to learn a function that approximates a relationship between inputs and outputs.
Unsupervised learning does not use labeled outputs. The goal here is to infer the natural structure in a dataset.
Give 3 of the most common algorithms for supervised learning and unsupervised learning?
Supervised learning algorithms:
Linear regression Logistic regression Decision trees Random forests Naive Bayes
Examples of unsupervised algorithms:
k-Means
Visualization and dimensionality reduction
Principal component analysis (PCA)
t-distributed Stochastic neighbor embedding (t-SNE)
Association rule learning (Apriori)
What is the Bayes’ Theorem and why do we use it?
Bayes’ Theorem is how we find a probability when we know other probabilities (posterior probability of a prior knowledge event). It is a way of calculating conditional probabilities.
In ML, Bayes’ theorem is used in a probability framework that fits a model to a training dataset and for building classification predictive modeling problems (i.e. Naive Bayes, Bayes Optimal Classifier).
What are Naive Bayes’ Classifiers?
Naive Bayes classifiers assume that the occurrence or absence of a feature does not influence the presence or absence of another feature.
When the assumption of independence holds, they are easy to implement and yield better results than other sophisticated predictors. They are used in spam filtering, text analysis, and recommendation systems.
What is a Discriminative model?
Discriminative models are a class of logistical models used for classification or regression. They distinguish decision boundaries through observed data (pass/fail, win/lose, healthy/sick).
Segmentation
Dense prediction task of pixel-wise classification.
FCN (Fully Convolutional Network)
Works by fine tuning an image classification CNN and applying pixel wise training.
- Compresses info using multiple layers of convolutions and pooling.
- Up-samples these features maps to predict each pixels class from compressed info.
Tokenization
Y
Embeddings
Low dimensional space in which we translate high dimensional vectors. Semantically similar inputs are placed closer together in the embedding space.
CNN
Y
Semantic Search
Y