8. Machine Learning & Statistical Concepts Flashcards
What is overfitting in machine learning?
Overfitting occurs when a model learns the training data too well, capturing noise instead of general patterns.
What is underfitting?
Underfitting occurs when a model is too simple to capture underlying patterns in the data.
What is the bias-variance tradeoff?
A tradeoff between a model’s ability to generalize (low variance) and fit the training data well (low bias).
What is cross-validation?
A resampling method used to evaluate a model’s performance on unseen data.
What is k-fold cross-validation?
A method where the dataset is split into k subsets, and the model is trained k times, each time using a different subset as the validation set.
What is leave-one-out cross-validation (LOOCV)?
A special case of k-fold where k equals the number of samples, leaving one sample for validation at each step.
What is feature scaling?
A preprocessing step that normalizes or standardizes data for better model performance.
What is the curse of dimensionality?
The phenomenon where high-dimensional data causes issues such as sparsity and increased computational cost.
What is PCA (Principal Component Analysis)?
A dimensionality reduction technique that projects data onto new axes maximizing variance.
What is the difference between supervised and unsupervised learning?
Supervised learning involves labeled data, while unsupervised learning deals with unlabeled data.
What is a loss function?
A function that measures how well a model’s predictions match actual values.
What is a confusion matrix?
A table that summarizes classification model performance by showing TP, FP, TN, FN.
What is precision in classification?
The proportion of true positives among all positive predictions (TP / (TP + FP)).
What is recall (sensitivity)?
The proportion of true positives among all actual positives (TP / (TP + FN)).
What is F1-score?
The harmonic mean of precision and recall, balancing the two metrics.
What is ROC curve?
A graph showing the performance of a classification model at various thresholds.
What is AUC (Area Under Curve)?
A measure of a model’s ability to distinguish between classes, where 1 is perfect and 0.5 is random guessing.
What is L1 regularization (Lasso)?
A method that adds absolute values of coefficients to the loss function, encouraging sparsity.
What is L2 regularization (Ridge)?
A method that adds squared coefficients to the loss function, preventing overfitting.
What is dropout in neural networks?
A regularization technique that randomly drops units during training to prevent overfitting.
What is ensemble learning?
A technique combining multiple models to improve performance.
What is bagging?
An ensemble method that trains multiple models on different subsets of data and averages their predictions.
What is boosting?
An ensemble method that trains models sequentially, focusing on errors made by previous models.
What is overfitting in machine learning?
Overfitting occurs when a model learns the training data too well, capturing noise instead of general patterns.
What is underfitting?
Underfitting occurs when a model is too simple to capture underlying patterns in the data.
What is the bias-variance tradeoff?
A tradeoff between a model’s ability to generalize (low variance) and fit the training data well (low bias).
What is cross-validation?
A resampling method used to evaluate a model’s performance on unseen data.
What is k-fold cross-validation?
A method where the dataset is split into k subsets, and the model is trained k times, each time using a different subset as the validation set.
What is leave-one-out cross-validation (LOOCV)?
A special case of k-fold where k equals the number of samples, leaving one sample for validation at each step.
What is feature scaling?
A preprocessing step that normalizes or standardizes data for better model performance.
What is the curse of dimensionality?
The phenomenon where high-dimensional data causes issues such as sparsity and increased computational cost.
What is PCA (Principal Component Analysis)?
A dimensionality reduction technique that projects data onto new axes maximizing variance.
What is the difference between supervised and unsupervised learning?
Supervised learning involves labeled data, while unsupervised learning deals with unlabeled data.
What is a loss function?
A function that measures how well a model’s predictions match actual values.
What is a confusion matrix?
A table that summarizes classification model performance by showing TP, FP, TN, FN.
What is precision in classification?
The proportion of true positives among all positive predictions (TP / (TP + FP)).
What is recall (sensitivity)?
The proportion of true positives among all actual positives (TP / (TP + FN)).
What is F1-score?
The harmonic mean of precision and recall, balancing the two metrics.
What is ROC curve?
A graph showing the performance of a classification model at various thresholds.
What is AUC (Area Under Curve)?
A measure of a model’s ability to distinguish between classes, where 1 is perfect and 0.5 is random guessing.