predictive analytics; prediction Flashcards

1
Q

confidence interval

A

quantifies uncertainty surrounding mean over a large quantity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

prediction interval

A

quantifies uncertainty surrounding prediction of single quantity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

machine learning definition

A

study of algorithms applied to data, focussing on prediction and classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

machine learning characteristics

A
  • formulae like regression, but rules to produce yhat
  • translated into computer code and automated
  • predictions generated quickly and repeated without intervention
  • takes error out of humans choosing variables
  • challenges of interpretability and bias
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

regression process

A
  • hypothesis
  • select variables
  • train
  • test
  • select best model
  • refine
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

problems of machine learning

A

1) good for prediction, not explanatory
2) black box (don’t know why machine chooses variables)
3) ethical (sub-optimal outcomes like discrimination)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

black swan

A

extreme outliers that have a disproportional effect (overfitting)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

overfitting

A
  • only see particular subset of data, and what is true is smth bigger we don’t observe
  • trying so hard to explain what we see, so poor at explaining what we don’t see
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

avoiding overfitting

A
  • split data into training and testing
  • training: subset of data, eg 80% to estimate formula
  • testing: 20% of data to test how well model predicts variables not shown
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

R squared

A

increases with number of variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

adjusted R squared

A
  • penalises for number of variables
  • asks if I gain much explanatory power for extra variables
  • can go down if irrelevant variable is included
  • bigger is better = more accurate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

mean squared error

A
  • common mean of selecting model
  • smaller is better = less error
  • best model = one with least errors
  • high 𝑅^2 implies low MSE
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

regression tree

A
  • output from machine learning
  • segments inputs in mutually exclusive/exhaustive regions
  • branches connect nodes to a terminal node (leaf)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly