Midterm-Yaseen Flashcards
What is supervised learning?
- Goal is to make accurate predictions for new, never-before-seen data
- we have input and output pairs to “learn” from
Examples:
- k-Nearest Neighbors
- Linear Models
- Naive Bayes Classifiers
- Decision Trees
- Ensembles of Decision Tress
- Kernelized Support Vector Machines
- Neural Networks (Deep Learning)
Describe the k-Nearest Neighbors algorithm.
-Utilizing the training dataset, find the nearest data points and classify according these data points.
Note:
For KNeighborsClassifier we use majority vote to determine classificaiton
For KnEighborsRegressor we use mean (R^2)
Explain what is meant by “underfitting.”
A model that cannot capture the variations present in the training data.
Explain the concept of “overfitting.”
A model that focuses too much on the training data and is not able to generalize to the new data very well
When should we use Nearest Neighbors methods?
- ideal for small datasets
- good as a baseline
- easy to explain
When should we use Linear Models?
- go-to as a first method
- good for very large datasets
- good for very high-dimensional data
When should we use Naive Bayes?
- Only used for classification
- faster than linear models
- good for very large datasets and high-dimensional data
Disadvantage: often less accurate than linear models
What are the advantages of Decision Tree methods?
- very fast
- don’t need scaling of the data
- can visualize and easily explained
What are the advantages of Random Forest?
- Nearly always perform better than a single decision tree
- very robust and powerful
- Does NOT require scaling of data
What is a disadvantage of Random Forest? (When should they not be used)
Not good for high-dimensional sparse data
Compare Gradient boosted decision Trees and Random Forests in terms of advantages.
- Gradient forests are slightly more accurate than random forests
- Gradient forests are slower to train than random forests
- Gradient forests are faster to predict and smaller in memory than random forests
- Gradient forests require more parameter tuning than random forests
Describe the advantages of Support Vector Machines.
- powerful for medium-sized datasets of features with similar meaning
- Does require scaling of data
- sensitive to parameter settings
Describe the advantages of Neural Networks
- Can build very complex models (particularly for large datasets)
- Sensitive to scaling of the data and choice of parameters
- large models require a long time to train
What are the two primary types of unsupervised learning?
- unsupervised transformations: creating a new representation of data which might be easier for humans or other machine learning algorithms to understand compared to the original representation.
(dimensionality reduction, topic extraction) - Clustering algorithms: partition data into distinct groups of similar items.
(like clustering in supervised learning but we don’t have output to compare to)
What is the primary challenge in unsupervised learning?
- no outcome to compare to (how well did we do? nobody knows!)
- We must manually inspect the results to see how we did.
What is a common utilization of unsupervised algorithms?
Exploratory setting:
-useful to change representation of data to then use a supervised learning method
Why is k-nearest neighbors algorithm is not often used in practice?
-slow prediction
-inability to handle many features
(although it is very easy to understand and reasonable performance without a lot of adjustments)
What is k-nearest neighbors best utilized for?
-good baseline method before considering more advanced techniques
Consider the following training and test set scores:
Training set score: 0.96
Test set score: 0.63
What is this a sign of?
The discrepancy is a sign of overfitting.
Consider the two sets of training/test set score pairs for two different machine learning algorithms:
Model 1:
Training set score: 0.98
Test set score: 0.82
Model 2:
Training set score: 0.84
Test set score: 0.80
Which model would you choose and why?
Model 2.
Although Model 1 has a slightly higher Test set score than Model 2, there is a much higher discrepancy between the Training and Test scores in Model 1 which is a sign of overfitting.
In Model 2, we see a lower training set score and a much smaller discrepancy between the Training and Test set score.
I would expect Model 2 to generalize better than Model 1 in a larger dataset and over time.
Explain the general concept of Ridge Regression.
Ridge regression is like Linear Regression with the added constraint that coefficients are chosen so their magnitude is small. We want the coefficients of each feature to be as close to zero as possible while still predicting well.
-This is an example of regularization, or explicitly restricting a model to avoid overfitting. (Specifically this is L2 regularization).
- Alpha is the parameter that is manipulated to determine the “penalty” applied to the coefficient
- smaller alpha means less of a penalty for large magnitude. This means a small alpha will lead to basically linear regression.
Explain the general concept of Lasso Regression
Lasso Regression is another example of regularization (specifically L1 regularization). It is like linear regression but with an added constraint but allows some coefficients to be exactly zero. This means that Lasso essentially can be used for feature selection (choosing the features that are important)
Again, lower alpha results closer to Linear Regression model.