Midterm 2.... Flashcards

1
Q

Convert odds of 1:8 to a probability.

A

1/8 = .125 –> .125/(.125 + 1) = 11.11% probability. Because 1:8 means 1 out of every 9.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Odds EQ for Probability(for/against)

A

odds(for/against)/odds(for/against) + 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does a logistic regression model predict?

A

LogOdds! This will have a range of (-infinity, infinity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you convert logOdds to odds?

A

e^logOdds = odds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you convert logOdds to probability?

A

e^logOdds/(e^logOdds + 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does logOdds equal in terms of ln(x)?

A

ln(odds) = logOdds or log(odds) = logOdds

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the range of odds (what are they bound by?)

A

[0, infinity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the range of logOdds (what are they bound by?)

A

(-infinity, infinity)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What type of estimation model is logistic regression, and why?

A

Class probability estimation model. It is using a numeric value to estimate the probability of a categorical variable! Ex. What is the chance Marc goes to class? 0.3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What loss function does support vector machine use?

A

Hinge loss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Hinge loss (loss function)

A

An instance on the wrong side of the line does not incur a penalty. ONLY when it’s on the wrong side and outside of the margin.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Zero-one loss

A

An instance incurs a loss of 0 for a correct decision and 1 for an incorrect decision.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Squared error

A

Specifies a loss equal to the square of the distance from the boundary. A further instance would have a greater error. Usually used for numeric value prediction rather than classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Loss function

A

Determines how much penalty should be assigned to an instance based on the model’s predictive value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Finish this sentence. Accuracy of training data is sometimes called…

A

In-sample accuracy (train) vs. out-of sample accuracy (test)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When is logistic regression more accurate vs. decision tree and vice versa?

A

LR is more accurate with a smaller data set, DT on bigger sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What’s the point of regularization?

A

It gives a penalty to more complicated models because those are more prone to overfitting.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

In a confusion matrix what are the column headers? Row headers?

A

Column: Actual y and n
Row: Predicted y and n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

False positive

A

Predicted positive, actual negative

20
Q

False negative

A

Predicted negative, actual positive

21
Q

True negative

A

Predicted negative, actual negative

22
Q

True positive

A

Predicted positive, actual positive

23
Q

True positive rate

A

True positive / all actual positives (both true and false)

24
Q

False positive rate

A

False positive / all actual negatives (both true and false)

25
Q

Positive predictive value (PPV)

A

True positive / all predicted positive (both correct and incorrect)

26
Q

What’s the expected value of a game of roulette? Probability of hitting black = 48%. Bet = $100

A

EV = (0.48)(100) + (1-0.48)(-100) = - 4

27
Q

What are the two uses for expected value?

A
  1. Inform how to use our classifier for individual predictions.
  2. Compare classifiers.
28
Q

Class priors

A

The proportion of positive and negative instances in your data set. Ex. 40 of 100 people would buy a new car next year if they could. p(p) = .4, p(n) = .6

29
Q

Two critical conditions underlying profit calculations:

A
  1. Class priors

2. Costs and benefits

30
Q

Where is the perfect point on an ROC curve (hint: x axis is FPR, y axis is TPR)

A

Top left. FPR of 0, TPR of 1

31
Q

How are ROC curves created?

A

TPR and FPR are found at every cutoff point. Ex. Titanic threshold of 0 to 1, values would be found at every .01

32
Q

What is the Area Under the ROC Curve used for (AUC)?

A

AUC is used when a single number is needed to summarize performance or when nothing

33
Q

What are two alternatives to the ROC curve?

A
  1. Cumulative response curve

2. Lift curve

34
Q

How is a lift curve calculated

A

Cumulative response curve values. y/x

35
Q

How can you calculate cumulative response curve values from lift?

A

Lift: x axis * y axis. Contact 20% with a lift of 2.5 means it will be .5 on the cumulative response curve.

36
Q

Euclidian distance

A

Distance formula but using it with attributes between two people.

37
Q

Manhattan distance

A

Distance using the two axes rather than the hypotenuse

38
Q

Euclidian vs. Manhattan

A

Euclidian uses hypotenuse of triangle (where two instances are the edges). Manhattan uses two bases.

39
Q

Nearest neighbors

A

Judge similarity by calculating distance to nearest neighbors and using those results to make a prediction. Ex. 3 nearest neighbors –> 2 no’s, 1 yes. Instance should be a no!

40
Q

How do we give weight to closest neighbors

A

With similarity weight: Inverse of distance squared –> contribution = sim. weight/(sum of all sim. weights)

41
Q

How do we get to a probability from nearest neighbors

A

Sum of all “no” contributions = p(no)

42
Q

How do we avoid overfitting the data with nearest neighbors?

A

By choosing a higher k = # of neighbors!

43
Q

What are the three issues with nearest neighbors?

A
  1. Dimensionality and domain knowledge. Unimportant features might have too much influence over important ones!
  2. Fast to train, slow to predict. Prediction requires plotting the entire dataset.
  3. Easy to interpret, but no “knowledge” extracted from data.
44
Q

Hierarchical clustering

A

Consider individual points and distance between them. Ex. Points with a Euclidian distance smaller than x will be clustered.

45
Q

Link function (clustering)

A

Minimum req. that must be met before an item is clustered.

46
Q

Centroid-based clustering

A

Decide k (number of centroids) and groups will be made around those. Points are grouped on which centroid they’re closest to. When a point is added the centroid is repositioned.