Prediction Flashcards

1
Q

Classification

A

Predicting when y=1 vs 0 (like churn). Tells you probability of y=1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Unconditional Probability

A

Unaffected by previous or future events

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Law of Large Numbers

A

Increased observations means increased precision of prediction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Conditional Probability

A

Stronger prediction, rather than just 25% of customers churn you divide churn rate by senior and not senior

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Bayes Rule

A

P(A|B) = P(A&B)/P(B)

P(A|B) = P(A)P(B)/P(B)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Threshold

A

Value between 0 and 1. The value for probability that will mean it is considered a 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Sigmoid Function

A

H= @ + @ * tenure and then L(H) = 1/(1+e^-H)

gets you those predictions on multiple variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Certainty through |H|

A

|H|> 2 okay
|H|> 5 quite sure
|H| > 10 super sure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Scatter Plotting Continuous Features

A

A method to analyze how your y value is affected by both. have feature 1 on the x, feature 2 on the y, and then two colours for whether each data point is y=1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Precision

A

How many false positives, if higher than less Fp

Tp/(Tp + Fp)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Accuracy

A

How many correct predictions (so true positives and true negatives)

(Tp + Tn)/(all obs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Recall

A

How many false negatives, if higher than less Fn

Tp/(Tp+Fn)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

F1

A

Balance between precision and recall

2((PR)/(P + R))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Underfitting

A

Model bad in testing, not flexible enough

High bias

Add observations or add features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Overfitting

A

Poor predictions in test data, too flexible on training data, too custom

High variance

less features or regularization (make simpler) or increased observations

Make sure 70/30 data split

17
Q

What to do when y= 0,1,or 2

A

Make z0=1 only if y= 0 then do Pr(z0|X)

Do the same with a z1 and z2

18
Q

Supervised Learning

A

Predict a labelled attribute y with features

classification, log regression

ex. home sales price (cond mean at each value to get line)

19
Q

Unsupervised Learning

A

no labelled target attribute to predict

come up with cluster observation on similar features

ex. market segments, similar images, news article types

20
Q

Anomaly Detection

A

Evaluate whether certain data anomalous

fraud detect, defect detect

21
Q

Reinforcement Learning

A

Implements actions, generates data, updates algorithm

Trade off between exploit and explore

Recco systems

22
Q

Machine Learning Benefits and Types

A

More precise because it can use more obs, more flexible forms, more variables

predicts then decides off that

Supervised Learning
Unsupervised Learning
Anomaly Detection
Reinforcement Learning

23
Q

External Validity

A

Model trained in context A can be used in context B

Likely higher when data context is similar