Machine_Learning_Interview_Flashcards_Final_Update
What is overfitting, and how can you prevent it?
Overfitting happens when a model performs well on training data but poorly on new data. It can be prevented using techniques like cross-validation, regularization, and simplifying the model.
Explain the bias-variance tradeoff.
Bias: Error due to overly simplistic models, which leads to underfitting
Variance: Error due to overly complex models, which leads to overfitting
The goal is to find a balance where the model is complex enough to capture the data patterns (low bias) but not so complex that it overfits (low variance).
What is precision?
Precision is the ratio of true positives to the total predicted positives, measuring how accurate the positive predictions are.
What is recall?
Recall is the ratio of true positives to the total actual positives, measuring how well the model identifies positive cases.
What is F1 score?
F1 score is the harmonic mean of precision and recall, providing a balanced measure when both are important.
What is AUC-ROC?
Metric for evaluating binary classification. It’s a probability curve that plots the True Positive Rate (TPR) against the False Positive Rate (FPR) at different thresholds. It measures the performance of a binary classifier by plotting true positive rate vs false positive rate.
What is regularization?
Regularization is a technique that discourages overfitting by adding a penalty to large model weights (L1 or L2 regularization).
How does a decision tree work?
Decision trees work by recursively splitting the dataset based on feature values that maximize the separation between different classes or outcomes. At each node, the algorithm chooses the best feature and threshold to split the data, creating branches until a stopping condition is met, such as reaching a maximum depth or a pure node.
Explain the working of random forests and how they reduce overfitting.
Random forests build multiple decision trees on random subsets of the data and average their predictions, reducing overfitting by combining the results of multiple trees.
What is gradient descent, and how does it work?
Gradient descent is an optimization algorithm that minimizes a loss function by iteratively updating the model parameters in the direction of the negative gradient.
What are support vector machines (SVMs)?
SVMs are supervised learning models that classify data by finding the hyperplane that best separates classes with maximum margin.
Explain the difference between bagging and boosting.
Bagging reduces variance by training multiple models on different subsets of data (outputs are averaged or use voting), while boosting reduces bias by sequentially training models to fix errors made by previous models (ex: higher weights to the misclassified data points)
What is a neural network, and how does backpropagation work?
A neural network is a set of connected layers that transform input data. Backpropagation updates weights based on the gradient of the loss function to minimize error.
What are CNNs and RNNs, and when are they used?
A Convolutional Neural Network (CNN) is a type of neural network designed for processing grid-like data, such as images, using convolutional layers to capture spatial features.
A Recurrent Neural Network (RNN) is a neural network designed for sequential data, where outputs from previous steps are fed back into the model to capture temporal dependencies.
What is transfer learning, and why is it useful?
Transfer learning leverages a pre-trained model and fine-tunes it on a new task, saving time and resources.
What are vanishing and exploding gradients? How do you address them?
Vanishing gradients occur when gradients are too small, and exploding gradients occur when they grow too large. They can be addressed with techniques like ReLU activation or gradient clipping.
What is dropout in neural networks?
Dropout randomly deactivates neurons during training to prevent overfitting by making the network more robust.
How do you handle missing data?
Missing data can be handled by removing rows, imputing values, or using algorithms that handle missing values, like decision trees.
What is feature scaling, and why is it important?
Feature scaling ensures all features contribute equally to the model by normalizing or standardizing data. This is crucial for algorithms like SVMs or neural networks.
What techniques do you use for feature selection?
Techniques include removing correlated features, using feature importance from models like Random Forests, or using dimensionality reduction methods like PCA.
What is the difference between L1 and L2 regularization?
L1 regularization (Lasso) encourages sparsity by shrinking coefficients to zero, while L2 regularization (Ridge) penalizes large coefficients more smoothly.
What is L1 regularization?
L1 regularization (Lasso) encourages sparsity by shrinking some feature weights to zero, useful for feature selection.
What is L2 regularization?
L2 regularization (Ridge) penalizes large weights, leading to smaller but non-zero weights, which helps reduce overfitting.
What are common methods for dimensionality reduction?
Common methods include PCA, t-SNE, and Autoencoders, which reduce the number of features while preserving important information.
How do you evaluate a machine learning model’s performance?
Performance can be evaluated using metrics such as accuracy, precision, recall, F1 score, ROC-AUC, and confusion matrices.
What is cross-validation, and why is it important?
Cross-validation splits the data into training and validation sets multiple times to ensure the model generalizes well on unseen data.
How do you handle imbalanced datasets?
Techniques include resampling methods (e.g., SMOTE), adjusting class weights, or using algorithms that handle imbalanced data natively.
What are confusion matrices, and how do you use them?
Confusion matrices provide a summary of predicted vs actual classes, useful for calculating precision, recall, and other metrics.
When would you use a generative model vs. a discriminative model?
Generative models learn the joint probability distribution and can generate new data, while discriminative models focus on the decision boundary.
What is reinforcement learning, and how does it differ from supervised learning?
Reinforcement learning learns through rewards from interacting with an environment, while supervised learning learns from labeled data.
What are GANs, and how do they work?
GANs consist of a generator and a discriminator, where the generator creates data, and the discriminator differentiates between real and generated data.
What is attention in deep learning models?
Attention mechanisms help models focus on important parts of the input sequence, improving performance in tasks like NLP and translation.
Explain PCA (Principal Component Analysis) and its applications.
PCA reduces dimensionality by projecting data onto principal components that capture the most variance, useful in visualization and noise reduction.
What is the difference between an LSTM and a GRU?
LSTMs have an additional forget gate compared to GRUs, making them more flexible but slower. GRUs are simpler and computationally efficient.
How would you deal with a dataset containing millions of features?
Dimensionality reduction techniques like PCA or selecting top features based on importance can help handle datasets with millions of features.