lecture 3 Flashcards

1
Q

What are the basic steps in offline machine learning?

A
  1. Abstract the problem to a standard task (Classification, Regression, etc.). 2. Choose instances and features. 3. Choose a model class. 4. Search for a good model.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is binary classification?

A

A classification task with two classes: positive and negative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is classification error?

A

The proportion of misclassified examples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is classification accuracy?

A

The proportion of correctly classified examples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why do we compare models?

A

To determine the best model for production use.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is an example of hyperparameter tuning in kNN?

A

Choosing the number of neighbors (k).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the simplest way to compare two classifiers?

A

Train both, compute their errors, and pick the one with the lowest error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is evaluating on training data misleading?

A

Because the model may overfit, performing well on training data but poorly on unseen data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the purpose of a test set?

A

To evaluate model performance on unseen data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the recommended minimum size for a test set?

A

At least 500 examples; ideally 10,000 or more.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the danger of testing many models on the same test set?

A

Overfitting to the test set due to multiple testing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is overfitting in model selection?

A

Choosing a model that performs well on a specific test set but generalizes poorly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the modern approach to model evaluation?

A
  1. Split data into train and test sets. 2. Choose model and hyperparameters using training data. 3. Test the model only once on test data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why shouldn’t test data be reused?

A

Reusing test data leads to selecting the wrong model and inflating performance estimates.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the purpose of a validation set?

A

To tune model hyperparameters without using the test set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is cross-validation?

A

A technique where training data is split into multiple subsets (folds) to validate model performance.

17
Q

What is walk-forward validation used for?

A

For time-series data, ensuring training data precedes test data chronologically.

18
Q

What is the difference between validation and evaluation?

A

Evaluation simulates production, validation simulates evaluation.

19
Q

What are common hyperparameter tuning methods?

A

Trial-and-error (intuition), grid search, and random search.

20
Q

Why is random search often better than grid search?

A

Random search explores more parameter values efficiently in high-dimensional spaces.

21
Q

Why is statistical testing controversial in ML?

A

Large datasets often make statistical tests unnecessary, and replication is the best validation.

22
Q

What is the difference between true accuracy and sample accuracy?

A

True accuracy is the actual probability of correct classification, while sample accuracy is the proportion of correctly classified test samples.

23
Q

What does a confidence interval represent?

A

The range within which the true metric likely falls in repeated experiments.

24
Q

What is the impact of test set size on confidence intervals?

A

Larger test sets produce narrower confidence intervals, increasing reliability.

25
Q

What is Alpaydin’s 5x2 F test used for?

A

To test statistical significance when test sets are small.

26
Q

What is the standard error of the mean (SEM)?

A

A measure of how much sample means vary from the true mean.

27
Q

What is the 95% confidence interval formula for the mean?

A

Mean ± 1.96 × SEM.

28
Q

What are common meanings of error bars?

A

They can represent standard deviation, standard error, or confidence intervals.

29
Q

What does overlap in error bars indicate?

A

If error bars overlap, the difference between models is likely not statistically significant.

30
Q

What should you avoid when interpreting confidence intervals?

A

Saying the probability that the true mean is in the interval is 95%. Instead, say that in 95% of repeated experiments, the true mean would fall in the interval.