Challenges of Machine Learning Flashcards

1
Q

what is overfitting?

A

when the model performs well on the training data, but doesnt generalise well

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what are some solutions to overfitting?

A

choose a simpler model
gather more training data
fix the data, remove any outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is regularization?

A

constraining a model to make it simpler and reduce the risk of overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

when does overfitting happen?

A

when the model is too complex relative to the amount and noisiness of the training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is a hyper-parameter?

A

a parameter that is not affected by the algorithm itself, it is global and must be set prior to training and remains constant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is underfitting?

A

when the model is too simple to learn underlying structure of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what are some solutions to underfitting?

A
select a more complex model
gather better features to the learning algorithm
reducing constraints (reduce regularization hyperparameter)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what are some examples of bad data?

A

insufficient quantity of training data
nonrepresentative data
poor quality data
irrelevant features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are two main challenges of machine learning?

A

bad data and bad algorithms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is feature selection?

A

selecting the most useful features to train among existing features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is feature extraction?

A

combining existing features to produce a more powerful one

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is poor quality data?

A

if the training data is full of errors, outliers and noise, it will make it harder for the system to detect the underlying patterns, so your system is less likely to perform well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is non-representative data?

A

the training data needs to be representative of the new cases you want to generalise to

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what happens if a sample is too small?

A

you will have sampling noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what happens if a sample is too large?

A

can be nonrepresentative if the sampling method is flawed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is insufficient quantity of training data?

A

an algorithm typically needs thousands of examples, and for complex problems millions.

17
Q

what can a large hyper parameter mean:

A

an almost flat model, the algorithm will not overfit but it will not find a good solution