SUPERVISED Flashcards

1
Q

what is generalization?

A

Amodel’sability to make correct predictions on new, previously unseen data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what does generalization refers to?

A

the model’s capability to adapt and react properly to previously unseen, new data that has the same characteristics as the training set

generalizationexamines how well a model can digest new data and make correct predictions after getting trained on a training set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is bias?

A

If your predictions are consistently off by a certain amount, that’s bias. For example, if you always predict that plants will grow taller than they actually do, you’ve got a bias issue.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is variance?

A

is all about inconsistency. If you make predictions that swing wildly from one extreme to another, that’s high variance. For instance, if one day you predict a plant will be super tall and the next day you say it’ll be super short, you’ve got a variance problem.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is a high bias model?

A

If your predictions are consistently off by a certain amount, that’s bias. A high-bias model doesn’t adapt well to new information; it’s stuck in its ways.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

what is a high variance model?

A

If you make predictions that swing wildly from one extreme to another, that’s high variance. A high-variance model is too sensitive to small changes in the data; it overreacts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

what is the perfect model for generalization?

A

low-bias and low-variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is overfitting?

A

If your model learns too much from the data you give it, it might do great on the examples it’s seen but not so great on new ones it hasn’t seen.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

when we train a overfit model what happens?

A

An overfit model gets a low loss during training but does a poor job predicting new data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is underfitting?

A

It happens when your model is too simple to capture the underlying patterns in the data. It’s like trying to fit a straight line to data that’s actually a curve.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

how to improve a overfit model?

A

by making sure you have enough diverse examples to learn from and using techniques that help your model focus on the big picture.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

how to improve a underfit model?

A

you need to make sure your model is complex enough to capture the important patterns in the data. You might need more features or a more sophisticated model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

what is confusion matrix?

A

confusion matrix is just a way to organize these results into a table. It helps you see how well your program is doing overall and where it might need improvement. The goal is to have as many true positives and true negatives as possible, and as few false positives and false negatives as possible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is true positive?

A

the model correctly predicts the positive class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

what is true negative?

A

the model correctly predicts the negative class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

what is false positive?

A

the model incorrectly predicts the positive class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

what is false negative?

A

the model incorrectly predicts the negative class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

what is F-1 score?

A

2 x precision x recall/ precision + recall

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

what is ROC

A

receiver operating curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

define ROC

A

is like a graph that shows how well a classification model performs. It helps us see how the model makes decisions at different levels of certainty.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

how does ROC plot the graph

A

The ROC curve is like a plot that helps you understand this. Here’s how:

True Positive Rate (Sensitivity): This is like how good your program is at correctly telling you to bring an umbrella when it’s actually going to rain. It’s the proportion of rainy days that your program correctly predicts.

False Positive Rate: This is like how often your program tells you to bring an umbrella when it’s not going to rain. It’s the proportion of non-rainy days that your program incorrectly predicts as rainy.

The ROC curve plots the True Positive Rate (Sensitivity) against the False Positive Rate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

what is AUC?

A

Area Under the Curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

what does AUC indicate?

A

the higher the AUC, the better the model’s performance at distinguishing between the positive and negative classes

24
Q

what happens when AUC=1

A

the classifier can correctly distinguish between all the Positive and the Negative class points

25
Q

what happens when AUC=0

A

then the classifier would predict all Negatives as Positives and all Positives as Negatives.

26
Q

In a binary classification problem, False Negatives (FN) are:

A

a) Cases where the model incorrectly predicts a negative outcome when the true outcome is negative.

b) Cases where the model incorrectly predicts a negative outcome when the true outcome is positive.

c) Cases where the model correctly predicts a negative outcome.

d) Cases where the model incorrectly predicts a positive outcome when the true outcome is positive.

27
Q

True Positives (TP) represent:

A

a) Cases where the model correctly predicts a positive outcome.

b) Cases where the model incorrectly predicts a positive outcome when the true outcome is negative.

c) Cases where the model correctly predicts a negative outcome.

d) Cases where the model incorrectly predicts a negative outcome when the true outcome is positive.

28
Q

How can False Positives (FP) be calculated?

A

a) Number of actual positive instances - Number of true positive instances

b) Number of actual negative instances - Number of true negative instances

c) Number of actual negative instances - Number of false negative instances

d) Number of actual positive instances - Number of false negative instances

29
Q

What do True Negatives (TN) represent?

A

a) Cases where the model incorrectly predicts a positive outcome when the true outcome is negative.

b) Cases where the model incorrectly predicts a negative outcome when the true outcome is positive.

c) Cases where the model correctly predicts a negative outcome.

d) Cases where the model correctly predicts a positive outcome.

30
Q

If the number of actual positive instances is 100 and the number of true positive instances is 80, what is the number of False Negatives (FN)?

A

a) 20
b) 80
c) 100
d) 0

31
Q

True or False: True Negatives (TN) represent cases where the model incorrectly predicts a positive outcome when the true outcome is negative.

A

a) True
b) False

32
Q

what is data

A

it comes in the form of words and numbers stored in tables, or as values of pixels and waveforms captured in audios and videos

33
Q

what are datasets made up of

A

features and labels

34
Q

what are features

A

Features are the values that a supervised model uses to predict the label.

35
Q

what is a label

A

The label is the “answer,” or the value we want the model to predict.

36
Q

what are labelled examples?

A

which contain both features and labels

37
Q

how is a dataset characterized?

A

and the basis of size and diversity

38
Q

which is a good dataset

A

a good data set is the one with both large in size and highly diverse

39
Q

what is a model

A

a model is the complex collection of numbers that define the mathematical relationship from specific input feature patterns to specific output label values.

40
Q

how is a model trained

A

Before a supervised model can make predictions, it must be trained. To train a model, we give the model a dataset with labeled examples. The model’s goal is to work out the best solution for predicting the labels from the features.

41
Q

what is loss

A

The model finds the best solution by comparing its predicted value to the label’s actual value. Based on the difference between the predicted and actual values—defined as theloss.

42
Q

why does a model need to be trained?

A

A model needs to be trained to learn the mathematical relationship between the features and the label in a dataset.

43
Q

what is evaluating

A

we evaluate a model to see how well it learned

to do so we we use a labelled dataset but we only give the model the dataset’s features. We then compare the model’s predictions to the label’s true values.

44
Q

what is inference

A

Once we’re satisfied with the results from evaluating the model, we can use the model to make predictions, calledinferences on unlabelled examples

45
Q

formula for accuracy?

A

TP+TN / TP+TN+FP+FN

46
Q

formula for precision?

A

TP/TP+FN

47
Q

formula for recall?

A

TP/TP+FN

48
Q

formula for specificity

A

TN/TN+FP

49
Q

how many line ROC have?

A

two

one for how often the model correctly identifies positive cases (true positives) and another for how often it mistakenly identifies negative cases as positive (false positives).

50
Q

what is correlation

A

statistical relationship between two variables

51
Q

what happens when correlation is 0

A

no correlation

52
Q

what happens when correlation is +1

A

positive correlation

53
Q

what happens when correlation is -1

A

negative correlation

54
Q

R-squared 0 ?

A

none of the label’s variance is due to the feature set

55
Q

R-squared 1?

A

all of the label’s variance is due to feature set

56
Q

R-squared between 0-1?

A

label’s variation can be predicted from a particular feature or the feature set.