review Flashcards

Question 1

Q

we come up with prediction in term of proabbility the use it to decide

Answer

A

if it 0 or 1 (categorical value)

Question 2

Q

transformation is non linear so we use

Answer

A

odds which is ratio of probablity/ 1- probability

Question 3

Q

in a regression outcome we are predicitng

Question 4

Q

baseline goal is to predict

Answer

A

wheter observation will be 0 or 1

Question 5

Q

baseline predicts

Answer

A

msot frequent outcome

Question 6

Q

which data set to use to find outcome of baseline model

Question 7

Q

how to build regression tree

Answer

A

splitting IV and predict most frequent outcome

Question 8

Q

how to come up with prediction

Answer

A

count the nu,bers of outcome per split

Question 9

Q

we choose how to define splits but use it

Answer

A

consisntley throught the model

Question 10

Q

how to decide where to split

Answer

A

First decide what objective (error points that misclassified) to minimize and maximize accuracy, try different points and select one that minimize error or max accuracy

Question 11

Q

most cases arent exact algorithim but

Answer

A

best found tree not optimal

Question 12

Q

Annova class is for prediciting variable

Answer

A

limitless where as classification is for probability between 0 and 1

Question 13

Q

continous is defined wiwthout a threshold how

Answer

A

most frewunt or verages

Question 14

Q

classfication problem deals with probability either 0 or 1 and when use probabilty always have

Answer

A

threshold, speceficty, senstivity –> ROC curve

Question 15

Q

single regression trees has high variance so prediciatbility will have

Answer

A

high variablity

Question 16

Q

how to fix high variablity issue

Answer

A

multiple treees in random forest then make prediciton based on most frequent outcome from multiple trees

Question 17

Q

Build 5 trees and once have predictions from these trees then make prediction based on most frequent outcome-> can use idea of thresholds too how

Answer

A

if 3/5 trees = 1 then p y =1 is 0.6 then use this number to compare to thresholds
 If threshold is 0.7 predict 0 because 0.6<= 0.7

Question 18

Q

we randomly selct what when build many trees

Answer

A

subset of variables. o Uses random row and column to make random and new data frame and based off this it is used to build trees and then repeat for multiple trees to make prediction on outcome

Question 19

Q

N tree is

Answer

A

parameter usually between 200-500

Question 20

Q

adjusting parameter doesnt really help becasue

Answer

A

default is good so just run algorithim

Question 21

Q

random forest is generalization of regression trees and always perforrms betetr but disaadvantage is

Answer

A

less interpratble