Trees Flashcards

Question 1

Q

what is a main issue with logsitic regression

Answer

A

coeffcicent indicaates effect of variable not how decesion is made

Question 2

Q

decesions we make in real life are:

Answer

A

in sequential order like regression trees

Question 3

Q

main benefit of regrssion trees

Answer

A

is easy to understand

Question 4

Q

in trees we know directrion the variable effects the probability but cant tell

Answer

A

impact on y variable

Question 5

Q

we split the data using the IV into

Answer

A

yes or no decisions

Question 6

Q

trees doesnt assumes model is

Question 7

Q

adding more splits will

Answer

A

increase accuracy

Question 8

Q

3 splits in the tree

Answer

A

3 decision tree levels

Question 9

Q

terminal node is an

Answer

A

output not condition ex- will tell you color red or grey

Question 10

Q

root node is

Answer

A

condition very correlated with y variable - most important one at top

Question 11

Q

we can have 100% accuracy if keep adding splits with no errors but issue is

Answer

A

too many variables and leads to overfitting

Question 12

Q

what is fix to overfitting issue

Answer

A

set lower bond on number of points in each subset

Question 13

Q

each split divides points into

Question 14

Q

if we set minimum bucket size = lower bound thenn we

Answer

A

wont split if points in the split is less than minimum bucket size

Question 15

Q

buckets only tell you

Answer

A

SIZE OF SPLit not the outcome of the bucket

Question 16

Q

if bucket size too large then

Answer

A

model is too simple and will have poor accuracy

Question 17

Q

logistic regression bucket size equivalent to

Answer

A

observations and predicts most frequent # or the baseline model

Question 18

Q

in classfication model use

Answer

A

majority vote

Question 19

Q

categorical variables/ classficiation are discrete variables what would using continous variables be

Answer

A

take average of all numbers, most frequent -> ex how many cars sold

Question 20

Q

regression trees are easy to understand issue is when using single tree

Answer

A

we will observe errors significsantlly higher

Question 21

Q

single trees are unstable meaning

Answer

A

small change in data= different trees/ interprattion

Question 22

Q

how to fix issue of trees

Question 23

Q

random forest generate multiple trees instead of 1 then

Answer

A

takes average to come up with predicted values. -> if 5 trees then need 5 different data sets

Question 24

Q

how does forest select trees

Answer

A

randomly select rows with replacement (same row can be selected )

Question 25

Q

why do we select rows randomly with replacement

Answer

A

because trees highly sensitive to small changes so we change data slighlty to make regression tree and then by taking an average reduces the variance in the oredicted value

Question 26

Q

minimum bucket in forests is called

Answer

A

Node size

Question 27

Q

other parameter in trees

Answer

A

number of trees

Question 28

Q

default number of trees in r

Question 29

Q

`more trees is better because

Question 30

Q

why would more trees = issue

Answer

A

computationally dificult

Question 31

Q

dont have to worry about what in forests

Answer

A

overfitting