Midterm 2.... Flashcards by Alexander Nanji

Convert odds of 1:8 to a probability.

1/8 = .125 –> .125/(.125 + 1) = 11.11% probability. Because 1:8 means 1 out of every 9.

How well did you know this?

Not at all

Perfectly

Odds EQ for Probability(for/against)

odds(for/against)/odds(for/against) + 1

How well did you know this?

Not at all

Perfectly

What does a logistic regression model predict?

LogOdds! This will have a range of (-infinity, infinity)

How well did you know this?

Not at all

Perfectly

How do you convert logOdds to odds?

e^logOdds = odds

How well did you know this?

Not at all

Perfectly

How do you convert logOdds to probability?

e^logOdds/(e^logOdds + 1)

How well did you know this?

Not at all

Perfectly

What does logOdds equal in terms of ln(x)?

ln(odds) = logOdds or log(odds) = logOdds

How well did you know this?

Not at all

Perfectly

What is the range of odds (what are they bound by?)

[0, infinity)

How well did you know this?

Not at all

Perfectly

What is the range of logOdds (what are they bound by?)

(-infinity, infinity)

How well did you know this?

Not at all

Perfectly

What type of estimation model is logistic regression, and why?

Class probability estimation model. It is using a numeric value to estimate the probability of a categorical variable! Ex. What is the chance Marc goes to class? 0.3

How well did you know this?

Not at all

Perfectly

What loss function does support vector machine use?

Hinge loss

How well did you know this?

Not at all

Perfectly

Hinge loss (loss function)

An instance on the wrong side of the line does not incur a penalty. ONLY when it’s on the wrong side and outside of the margin.

How well did you know this?

Not at all

Perfectly

Zero-one loss

An instance incurs a loss of 0 for a correct decision and 1 for an incorrect decision.

How well did you know this?

Not at all

Perfectly

Squared error

Specifies a loss equal to the square of the distance from the boundary. A further instance would have a greater error. Usually used for numeric value prediction rather than classification.

How well did you know this?

Not at all

Perfectly

Loss function

Determines how much penalty should be assigned to an instance based on the model’s predictive value

How well did you know this?

Not at all

Perfectly

Finish this sentence. Accuracy of training data is sometimes called…

In-sample accuracy (train) vs. out-of sample accuracy (test)

How well did you know this?

Not at all

Perfectly

When is logistic regression more accurate vs. decision tree and vice versa?

LR is more accurate with a smaller data set, DT on bigger sets

How well did you know this?

Not at all

Perfectly

What’s the point of regularization?

It gives a penalty to more complicated models because those are more prone to overfitting.

How well did you know this?

Not at all

Perfectly

In a confusion matrix what are the column headers? Row headers?

Column: Actual y and n
Row: Predicted y and n

How well did you know this?

Not at all

Perfectly

False positive

Study These Flashcards

Predicted positive, actual negative

False negative

Study These Flashcards

Predicted negative, actual positive

True negative

Study These Flashcards

Predicted negative, actual negative

True positive

Study These Flashcards

Predicted positive, actual positive

True positive rate

Study These Flashcards

True positive / all actual positives (both true and false)

False positive rate

Study These Flashcards

False positive / all actual negatives (both true and false)

Positive predictive value (PPV)

True positive / all predicted positive (both correct and incorrect)

What's the expected value of a game of roulette? Probability of hitting black = 48%. Bet = $100

EV = (0.48)(100) + (1-0.48)(-100) = - 4

What are the two uses for expected value?

1. Inform how to use our classifier for individual predictions. 2. Compare classifiers.

Class priors

The proportion of positive and negative instances in your data set. Ex. 40 of 100 people would buy a new car next year if they could. p(p) = .4, p(n) = .6

Two critical conditions underlying profit calculations:

1. Class priors | 2. Costs and benefits

Where is the perfect point on an ROC curve (hint: x axis is FPR, y axis is TPR)

Top left. FPR of 0, TPR of 1

How are ROC curves created?

TPR and FPR are found at every cutoff point. Ex. Titanic threshold of 0 to 1, values would be found at every .01

What is the Area Under the ROC Curve used for (AUC)?

AUC is used when a single number is needed to summarize performance or when nothing

What are two alternatives to the ROC curve?

1. Cumulative response curve | 2. Lift curve

How is a lift curve calculated

Cumulative response curve values. y/x

How can you calculate cumulative response curve values from lift?

Lift: x axis * y axis. Contact 20% with a lift of 2.5 means it will be .5 on the cumulative response curve.

Euclidian distance

Distance formula but using it with attributes between two people.

Manhattan distance

Distance using the two axes rather than the hypotenuse

Euclidian vs. Manhattan

Euclidian uses hypotenuse of triangle (where two instances are the edges). Manhattan uses two bases.

Nearest neighbors

Judge similarity by calculating distance to nearest neighbors and using those results to make a prediction. Ex. 3 nearest neighbors --> 2 no's, 1 yes. Instance should be a no!

How do we give weight to closest neighbors

With similarity weight: Inverse of distance squared --> contribution = sim. weight/(sum of all sim. weights)

How do we get to a probability from nearest neighbors

Sum of all "no" contributions = p(no)

How do we avoid overfitting the data with nearest neighbors?

By choosing a higher k = # of neighbors!

What are the three issues with nearest neighbors?

1. Dimensionality and domain knowledge. Unimportant features might have too much influence over important ones! 2. Fast to train, slow to predict. Prediction requires plotting the entire dataset. 3. Easy to interpret, but no "knowledge" extracted from data.

Hierarchical clustering

Consider individual points and distance between them. Ex. Points with a Euclidian distance smaller than x will be clustered.

Link function (clustering)

Minimum req. that must be met before an item is clustered.

Centroid-based clustering

Decide k (number of centroids) and groups will be made around those. Points are grouped on which centroid they're closest to. When a point is added the centroid is repositioned.

Midterm 2.... Flashcards

(46 cards)