8. Machine Learning & Statistical Concepts Flashcards

1
Q

What is overfitting in machine learning?

A

Overfitting occurs when a model learns the training data too well, capturing noise instead of general patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is underfitting?

A

Underfitting occurs when a model is too simple to capture underlying patterns in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the bias-variance tradeoff?

A

A tradeoff between a model’s ability to generalize (low variance) and fit the training data well (low bias).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is cross-validation?

A

A resampling method used to evaluate a model’s performance on unseen data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is k-fold cross-validation?

A

A method where the dataset is split into k subsets, and the model is trained k times, each time using a different subset as the validation set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is leave-one-out cross-validation (LOOCV)?

A

A special case of k-fold where k equals the number of samples, leaving one sample for validation at each step.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is feature scaling?

A

A preprocessing step that normalizes or standardizes data for better model performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the curse of dimensionality?

A

The phenomenon where high-dimensional data causes issues such as sparsity and increased computational cost.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is PCA (Principal Component Analysis)?

A

A dimensionality reduction technique that projects data onto new axes maximizing variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the difference between supervised and unsupervised learning?

A

Supervised learning involves labeled data, while unsupervised learning deals with unlabeled data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a loss function?

A

A function that measures how well a model’s predictions match actual values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a confusion matrix?

A

A table that summarizes classification model performance by showing TP, FP, TN, FN.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is precision in classification?

A

The proportion of true positives among all positive predictions (TP / (TP + FP)).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is recall (sensitivity)?

A

The proportion of true positives among all actual positives (TP / (TP + FN)).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is F1-score?

A

The harmonic mean of precision and recall, balancing the two metrics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is ROC curve?

A

A graph showing the performance of a classification model at various thresholds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is AUC (Area Under Curve)?

A

A measure of a model’s ability to distinguish between classes, where 1 is perfect and 0.5 is random guessing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is L1 regularization (Lasso)?

A

A method that adds absolute values of coefficients to the loss function, encouraging sparsity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is L2 regularization (Ridge)?

A

A method that adds squared coefficients to the loss function, preventing overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is dropout in neural networks?

A

A regularization technique that randomly drops units during training to prevent overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is ensemble learning?

A

A technique combining multiple models to improve performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is bagging?

A

An ensemble method that trains multiple models on different subsets of data and averages their predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is boosting?

A

An ensemble method that trains models sequentially, focusing on errors made by previous models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is overfitting in machine learning?

A

Overfitting occurs when a model learns the training data too well, capturing noise instead of general patterns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is underfitting?

A

Underfitting occurs when a model is too simple to capture underlying patterns in the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the bias-variance tradeoff?

A

A tradeoff between a model’s ability to generalize (low variance) and fit the training data well (low bias).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What is cross-validation?

A

A resampling method used to evaluate a model’s performance on unseen data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is k-fold cross-validation?

A

A method where the dataset is split into k subsets, and the model is trained k times, each time using a different subset as the validation set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What is leave-one-out cross-validation (LOOCV)?

A

A special case of k-fold where k equals the number of samples, leaving one sample for validation at each step.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What is feature scaling?

A

A preprocessing step that normalizes or standardizes data for better model performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is the curse of dimensionality?

A

The phenomenon where high-dimensional data causes issues such as sparsity and increased computational cost.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is PCA (Principal Component Analysis)?

A

A dimensionality reduction technique that projects data onto new axes maximizing variance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is the difference between supervised and unsupervised learning?

A

Supervised learning involves labeled data, while unsupervised learning deals with unlabeled data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is a loss function?

A

A function that measures how well a model’s predictions match actual values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is a confusion matrix?

A

A table that summarizes classification model performance by showing TP, FP, TN, FN.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What is precision in classification?

A

The proportion of true positives among all positive predictions (TP / (TP + FP)).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What is recall (sensitivity)?

A

The proportion of true positives among all actual positives (TP / (TP + FN)).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What is F1-score?

A

The harmonic mean of precision and recall, balancing the two metrics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What is ROC curve?

A

A graph showing the performance of a classification model at various thresholds.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What is AUC (Area Under Curve)?

A

A measure of a model’s ability to distinguish between classes, where 1 is perfect and 0.5 is random guessing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What is L1 regularization (Lasso)?

A

A method that adds absolute values of coefficients to the loss function, encouraging sparsity.

42
Q

What is L2 regularization (Ridge)?

A

A method that adds squared coefficients to the loss function, preventing overfitting.

43
Q

What is dropout in neural networks?

A

A regularization technique that randomly drops units during training to prevent overfitting.

44
Q

What is ensemble learning?

A

A technique combining multiple models to improve performance.

45
Q

What is bagging?

A

An ensemble method that trains multiple models on different subsets of data and averages their predictions.

46
Q

What is boosting?

A

An ensemble method that trains models sequentially, focusing on errors made by previous models.

47
Q

What is overfitting in machine learning?

A

Overfitting occurs when a model learns the training data too well, capturing noise instead of general patterns.

48
Q

What is underfitting?

A

Underfitting occurs when a model is too simple to capture underlying patterns in the data.

49
Q

What is the bias-variance tradeoff?

A

A tradeoff between a model’s ability to generalize (low variance) and fit the training data well (low bias).

50
Q

What is cross-validation?

A

A resampling method used to evaluate a model’s performance on unseen data.

51
Q

What is k-fold cross-validation?

A

A method where the dataset is split into k subsets, and the model is trained k times, each time using a different subset as the validation set.

52
Q

What is leave-one-out cross-validation (LOOCV)?

A

A special case of k-fold where k equals the number of samples, leaving one sample for validation at each step.

53
Q

What is feature scaling?

A

A preprocessing step that normalizes or standardizes data for better model performance.

54
Q

What is the curse of dimensionality?

A

The phenomenon where high-dimensional data causes issues such as sparsity and increased computational cost.

55
Q

What is PCA (Principal Component Analysis)?

A

A dimensionality reduction technique that projects data onto new axes maximizing variance.

56
Q

What is the difference between supervised and unsupervised learning?

A

Supervised learning involves labeled data, while unsupervised learning deals with unlabeled data.

57
Q

What is a loss function?

A

A function that measures how well a model’s predictions match actual values.

58
Q

What is a confusion matrix?

A

A table that summarizes classification model performance by showing TP, FP, TN, FN.

59
Q

What is precision in classification?

A

The proportion of true positives among all positive predictions (TP / (TP + FP)).

60
Q

What is recall (sensitivity)?

A

The proportion of true positives among all actual positives (TP / (TP + FN)).

61
Q

What is F1-score?

A

The harmonic mean of precision and recall, balancing the two metrics.

62
Q

What is ROC curve?

A

A graph showing the performance of a classification model at various thresholds.

63
Q

What is AUC (Area Under Curve)?

A

A measure of a model’s ability to distinguish between classes, where 1 is perfect and 0.5 is random guessing.

64
Q

What is L1 regularization (Lasso)?

A

A method that adds absolute values of coefficients to the loss function, encouraging sparsity.

65
Q

What is L2 regularization (Ridge)?

A

A method that adds squared coefficients to the loss function, preventing overfitting.

66
Q

What is dropout in neural networks?

A

A regularization technique that randomly drops units during training to prevent overfitting.

67
Q

What is ensemble learning?

A

A technique combining multiple models to improve performance.

68
Q

What is bagging?

A

An ensemble method that trains multiple models on different subsets of data and averages their predictions.

69
Q

What is boosting?

A

An ensemble method that trains models sequentially, focusing on errors made by previous models.

70
Q

What is overfitting in machine learning?

A

Overfitting occurs when a model learns the training data too well, capturing noise instead of general patterns.

71
Q

What is underfitting?

A

Underfitting occurs when a model is too simple to capture underlying patterns in the data.

72
Q

What is the bias-variance tradeoff?

A

A tradeoff between a model’s ability to generalize (low variance) and fit the training data well (low bias).

73
Q

What is cross-validation?

A

A resampling method used to evaluate a model’s performance on unseen data.

74
Q

What is k-fold cross-validation?

A

A method where the dataset is split into k subsets, and the model is trained k times, each time using a different subset as the validation set.

75
Q

What is leave-one-out cross-validation (LOOCV)?

A

A special case of k-fold where k equals the number of samples, leaving one sample for validation at each step.

76
Q

What is feature scaling?

A

A preprocessing step that normalizes or standardizes data for better model performance.

77
Q

What is the curse of dimensionality?

A

The phenomenon where high-dimensional data causes issues such as sparsity and increased computational cost.

78
Q

What is PCA (Principal Component Analysis)?

A

A dimensionality reduction technique that projects data onto new axes maximizing variance.

79
Q

What is the difference between supervised and unsupervised learning?

A

Supervised learning involves labeled data, while unsupervised learning deals with unlabeled data.

80
Q

What is a loss function?

A

A function that measures how well a model’s predictions match actual values.

81
Q

What is a confusion matrix?

A

A table that summarizes classification model performance by showing TP, FP, TN, FN.

82
Q

What is precision in classification?

A

The proportion of true positives among all positive predictions (TP / (TP + FP)).

83
Q

What is recall (sensitivity)?

A

The proportion of true positives among all actual positives (TP / (TP + FN)).

84
Q

What is F1-score?

A

The harmonic mean of precision and recall, balancing the two metrics.

85
Q

What is ROC curve?

A

A graph showing the performance of a classification model at various thresholds.

86
Q

What is AUC (Area Under Curve)?

A

A measure of a model’s ability to distinguish between classes, where 1 is perfect and 0.5 is random guessing.

87
Q

What is L1 regularization (Lasso)?

A

A method that adds absolute values of coefficients to the loss function, encouraging sparsity.

88
Q

What is L2 regularization (Ridge)?

A

A method that adds squared coefficients to the loss function, preventing overfitting.

89
Q

What is dropout in neural networks?

A

A regularization technique that randomly drops units during training to prevent overfitting.

90
Q

What is ensemble learning?

A

A technique combining multiple models to improve performance.

91
Q

What is bagging?

A

An ensemble method that trains multiple models on different subsets of data and averages their predictions.

92
Q

What is boosting?

A

An ensemble method that trains models sequentially, focusing on errors made by previous models.

93
Q

What is overfitting in machine learning?

A

Overfitting occurs when a model learns the training data too well, capturing noise instead of general patterns.

94
Q

What is underfitting?

A

Underfitting occurs when a model is too simple to capture underlying patterns in the data.

95
Q

What is the bias-variance tradeoff?

A

A tradeoff between a model’s ability to generalize (low variance) and fit the training data well (low bias).

96
Q

What is cross-validation?

A

A resampling method used to evaluate a model’s performance on unseen data.

97
Q

What is k-fold cross-validation?

A

A method where the dataset is split into k subsets, and the model is trained k times, each time using a different subset as the validation set.

98
Q

What is leave-one-out cross-validation (LOOCV)?

A

A special case of k-fold where k equals the number of samples, leaving one sample for validation at each step.

99
Q

What is feature scaling?

A

A preprocessing step that normalizes or standardizes data for better model performance.

100
Q

What is the curse of dimensionality?

A

The phenomenon where high-dimensional data causes issues such as sparsity and increased computational cost.