Advanced Regression Flashcards

1
Q

What contexts can trees be used in ?

A

-classification
-decision making
-logistic regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Trees in regression

A

split regression into branches where each branch is a different regression that fits the data better

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How it’s done in practice

A

-not a full regression
-simple regression only using constant term
-y = a0

a0 = sum i in node yi / # data points in node = avg response in node

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Branching method

A

start with half the data
-for each leaf
-calculate variance
-split on each factor, find biggest variance decrease, make split for biggest decrease (if more than threshold)
repeat until no split decreases variance more than threshold

then using the other half of the data for each branching point
-calculate estimation error with and without branching
-if branching increases error, remove branching

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Other branching methods

A

key idea
-using a metric related to the models quality
-find the best factor to branch wiht
-check: did this really improve the model?
if not, prune the branch back

rejecting a potential branch
-low imporvement benefit
-one side of the branch has too few data points
-rule of thumb- each leaf contains at least 5% of the original data

overfitting can be costly; make sure the benefit of each branch is greater than the cost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

random forest differences from branching

A

-introduce randomness
-generate many different trees
-different strengths and weeknesses
-average may be better than a single tree?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How do we introduce randomness to random trees?

A
  1. bootstrapping process
    give each tree a slightly different set of data
    if we start with n data points, each tree gets n data points, although it could have multiple of the same one or none of another

2branching
-choose 1 factor at a time
-randomly choose a small number of factors, set X (commonly 1 + log(X))
-choose the best factor within X to branch on

  1. dont prune the tree
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Results of random forest

A

-each tree in the forest has slightly different data
-end up with many different trees (usually500-1000) (random forest)
-each tree may give us a different regression model (which to choose?)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do you pick the tree in the random forest?

A

You don’t use a single one!

-if it s regression tree you use the avg predicted response across all the trees in your forest.
-if its a classification tree, we use the mode ( most common predicted response)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Benefits of random forest

A

better overall estimates - some may overfit, they don’t all overfit the same way
-avgs between trees somewhat neutralizes over-fitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

drawback of random forest

A

-harder to explain/interpret the results
-doesn’t tell us how the variables interact or how a certain sequence of branches is helpful or meaningful - bc all branches are different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

random forest good in what situation?

A

-black box model/ default

-no good reason to try something else

-not good for detailed insight into whats going on

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explainability/interpretability

A

how easy or difficult it is to know how models create their output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

ex of explainability: linear regression

A

y = a0 + sum j = 1 to n ajxij

how is the value of y affected by different values of the predictors?

baseline is a0 and each coefficient controls the “weight” of each variable ie how much it impects the response

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

ex of explainability: linear regression tree

A

all of the results are conditional based on the branch they are in
-even harder with random forest

-it does give relative branching importance of each variable
-but not how, so not precise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

explainability tradeoff

A

-the more we know how a model is getting it’s results the better we can understand why
-explain better to decision makers and choose to implement our ideas
-sometimes explainability is a legal requirement

-less explainable model sometimes give better results

16
Q

use of logisitc regression

A

have a probability between 0 and 1

17
Q

logistic regression model

A

Standard linear regression
y = a0 + a1x1 + a2x2 +… ajxj

logisitic regression
p = the probability of the event you want to observe

log (p/(1-p)) =a0 + a1x1 + a2x2 +… ajxj

p = 1/ (1+e^-(a0 + a1x1 + a2x2 +… ajxj))

if y = a0 + a1x1 + a2x2 +… ajxj = -infinit then p = 0
if y = a0 + a1x1 + a2x2 +… ajxj = ininity then p = 1

18
Q

logitistc regression curve

A

-looks like an s
-coefficients can change the shape
-data points are all at 0 or 1

19
Q

logistic vs linear regression

A

-similarities
-transformations of input data
-consider interaction terms
-variable selection
-logisitc regression tress
-random logisitc regression forests

log. reg. differences
-longer to compute
-no closed form solution
-understanding model quality

20
Q

logistic vs linear regression measures of model quality

A

lin. reg.
-r squared - fraction of variance explained by the model

log reg.
pseudo r squared value
-not really measuring fraction of variance

21
Q

logistic regression classifications

A

thresholding
-answer yes if probability p is at least some number
-otherwise no

ex
if p>= 0.7 give loan, otherwise no

22
Q

receiver operating chacteristic curve (ROC)

A

y = sensitivity = (true positives)/(true positives + false negatives)
x = specificity = (true negatives) / (true negatvies + false positives)

area under curve (auc) - prob that the model estimates a random yes point higher than a random no point ( ex loans)

auc =0.5 just guessing

23
Q

value of roc/auc

A

gives a quick and dirty estimate of quality -
-but does not differentiate between the cost of FN or FP
-highest value solutions? confusion matrix

24
Q

confusion matrix

A

how much model is confusing each data point
-TP- point in the category, correctly classified
-FP - point not in category, model says it is
-TN - point not in category, correctly classified
-FN - point in the category, model says no

25
Q

confusion matrix guidelines

A

positive - model says its in the category
negative - model says it not in the category
true - model got it right
false- model got it wrong

26
Q

sensitivity

A

fraction of category members that are correctly identifieid

tp / (tp+fn)

27
Q

specificity

A

fraction of non category members that are correctly identifieid

tn/(tn+fp)

28
Q

cost of lost productivity

A

how it costs for each box in the confusion matrix (tp and tn are obviously 0)

depends on the percentage of responses that you’re expecting

29
Q

Poisson regression

A

-use when we think the response follows poisson distribution

f(z) = lambda^z e^-lambda / z!

extimate lambda(x)

30
Q

regression splines

A

splines - function of polynomials that connect to each other

allows us to fit different functions to different parts of the data set
-smooth connections between the parts to ensure that we don’t have drastically different answer for nearby point

-order k - polynomails are order k
-ex: multiadaptive regression splines (MARS) also called earch

31
Q

Bayesian regression

A

start with
-data
-estimate of how the reg coefficients and randome error is distributed

ex: predict how tall a child will be as an adult based on
-data
experts opinion - starting distribution

use bayes theorem to update estimate
-most helpful when not much data

32
Q
A