Lecture 6 - Support Vector Machines, Decision Tree, Ensemble, Hypothesis Testing Flashcards
Are Support Vector Machines(SVM) used for supervised or unsupervised learning?
SVM is a set of related supervised learning methods
Is SVM used for Regression or Classification?
TRICK QUESTION: It’s used for both Regression and Classification.
How does SVM work on nonlinear data?
It adds a dimension, and then separates the data into two linearly separated groups.
What are some of the strengths and weaknesses of SVM?
Strengths:
- Easy Training
- No Local optimal
- Scales well
- Trade-off between classifier complexity and error can be controlled explicitly
Weakness
- Efficiency depends on choosing kernel function
True or False: Changing Kernel will not give different results. Changing kernel is only related to the speed of the functions.
FALSE: Changing Kernel will give different results!
True or False: Decision trees can perform both classification and regression tasks
TRUE
True or False: Decision trees can be understood as a lot of “if/else” statements
TRUE
Explain how decision trees are structured
It starts with a root node, that splits into different Decision nodes (Each node represents a question that split data). These can be seen as different branches/sub-trees. After a number of these decision nodes, they end up at a certain terminal node which represents an output.
What are some of the advantages of decision trees?
Simple to understand and interpret
It implicitly performs feature selection
It can handle both numerical and categorical data
Require relatively little effort for data preparation
Non-linear relationships do not impact the model’s performance
Has many use cases
What are some of the disadvantages of decision trees?
Can overfit data
Decision trees can become unstable, as small variance in data can cause the decision tree to become unstable.
Greedy algorithms cannot guarantee to return the global optimal tree (Can be resolved by training multiple trees)
Describe the Decision Tree Training Algorithm
Given a set of labeled training instances:
1. If all the training instances have same class, create a leaf with that class label and exit.
2 ELSE Pick the best test to split the data on
3. Split the training set according to the value of the outcome of the test
4. Recursively repeat step 1-3 on each subset of the training data
What is Gini used for?
Gini is a measure of impurity showing how pure or how similar the classes of the observations in a given node are
So, if there is a test and one branch leads to two options, the ideal scenario would be to have one entire class in one option, and the other class in the other option. In this case, the GINI index would be 0.
What is Entropy?
Entropy is used as an impurity measure (pretty similar to the GINI index, only the calculations differ)
- A set’s entropy is zero when it contains instances of only one class.
- Reduction of entropy is called an information gain (Shannon’s information theory)
What does “Ensemble” mean?
A group of something (Musicians, actors, decision trees;)
Why is Random Forest an Ensemble model?
Random forest models multiple decision trees (Therefore, an ENSEMBLE of decision trees)
How does a Random Forest work?
In classification:
It creates multiple Decision Trees. Each decision tree ultimately votes for one classification. The classification with most votes is the one selected
In regression:
It creates multiple Decision Trees. Each decision tree ultimately votes for one value. It then chooses the average value predicted by all decision trees.
What are the advantages of using Random Forests?
- Can be used for both classification and regression tasks
- Handles missing values and maintains accuracy for missing data
- Wont overfit the model (Not fully true: Can use hyperparameter tuning to avoid overfitting)
- Handles large datasets with higher dimensionality
What are the disadvantages of using Random Forests?
- Does better at Regression than classification(Not sure if this is true?)
- You have very little control on what the model does
Walk me through the Random Forest algorithm (pseudocode)
- Assume number of cases in the training set is N. Then, sample of these N cases is taken at random but with replacement.
- If there are M input variables or features, a number m
What is Bagging?
Bagging is a machine learning ensemble meta-algorithm designed to improve the stability and accuracy as well as reduce variance
Bagging builds many independent predictors and combines then using some model average techniques (avg, majority vote)
What is Boosting?
Boosting is a machine learning ensemble meta-algorithm primarily for reducing bias and also variance in supervised learning
Boosting is when predictors are not made independently but sequentially
What are the steps (conceptually) of Gradient Boosting?
- Fit an additive model (ensemble) in a forward stage-wise manner
- In each stage, introduce a weak learner to compensate the shortcomings of existing weak learners
* In Gradient Boosting, “shortcomings” are identified by gradients
* Gradients tell us how to improve the model
Here’s a very simple conceptualisation of how gradient boosting works
Diana is good at guessing ages of people (She’s the base Predictor)
Simon has noticed that when Diana guesses the ages of men, she usually undershoots by around -3 years, and when she guesses women, she overshoots by +3 years. (Therefore, Simon is a gradient, that helps the shortcoming of the predecessor)
Yannic has noticed that Diana and Simon are guessing wrong if the person is from Sweden. He found that they usually overshoot by around one year (Yannic is another gradient that helps the shortcoming of the predecessors)
Now, they are trying to guess the age of a woman called Olivia.
Diana guesses 23, Simon notices that Olivia is a woman, and tells us to -3 and Yannic notices that Olivia is swedish and tells us to -1
Therefore, the Ensemble has chosen this as Olivias age:
23-3-1 = 19
What is a support vector in SVM?
Support vectors are the datapoints that the margin of the hyperplane pushes up against (so, the points that are closest to the opposite class)