2. Decision Tree Flashcards
What is the two different methods of decision tree?
Classification and regression trees
Which type of learning can you use with the methods?
Both methods can take supervised and un-supervised learning
When do you use classification trees?
When target variable is categorical, splitting based on similar categories and final prediction is a class label
How do you evaluate a classification tree?
Classification trees are evaluated through purity and impurity measures
When do you use regression tree?
Regression tree is used when the target variable is continous, splitting data into segments of similar value and final prediction being numerical value
How do you evaluate regression model?
A regression model is evaluated through computing average loss per observation (either L1, L2) or through MSE
What output does classification produce?
Classification tree produce discrete output, learning a function that maps a data object to a discrete class
What does classification tree classify?
The relation between attributes and class
What can a classifier help predict?
It can help predict the class of a new data object
How can we explain the classification tree?
It is represented by a rooted tree, where each node represents a partition of the input space
What does branches and leafs represent in classification tree?
Branches = attributes
Leafs = decisions
What sentence can explain classification tree?
Ask a series of questions until a conclusion is reached
What is the foundation of the impurity measure I(r) or I(vk)?
Impurity measure is ensuring the best split for each split for each node. How impure is a set of data points is with respect to their class labels.
What is a good impurity score for classification problem?
Zero is all data points belong to a single class
High values of impurity indicate even mix of classes and is bad
In classification what is the three different impurity measures?
Gini index, entropy and misclassification error
In classification, when do use each of the three different measures?
Gini index - focus on higher performance (accuracy)
Entropy - information gain, theoretical measure less sensitive to small changes in probability
ClassError - focus on fraction of misclassified instances, incorrect predictions
In classification which method is used to see how good the model is?
Purity gain
In classification how does the purity gain work?
Also called impurity reduction, evaluating how well a particular feature or attribute splits the data in a decision tree
In regression, which method is used to evaluate the model?
Absolute error, average loss per observation
In regression, there is two different error calculations?
Mean absolute error (MAE)
Mean squared error (MSE)
In regression when should you use MAE and MSE?
MAE - penalizes all errors linearly and is more robust to outliers, used when data is more alike
MSE - penalizes larger errors more heavily, used for larger deviations in the data
What different Euclidean loss are there and where are they used?
L1 - MAE, less sensitive to outliers
L2 - MSE, average of the squared difference between predicted values and the actual
L ∞ norm - measure the maximum error, focus on worst case error