Statistical Learning Methods Flashcards
Bagging
A type of random forests that does the same thing but uses all features at each split because m = p
Random Forests
using x number of bootstrapped training sets to construct x number of regression trees using a subset of features at each split then averaging the results
Logistic Regression
Supervised machine learning algorithm widely used for binary classification tests
Boosting
Boosting improves machine models’ predictive accuracy and performance by converting multiple weak learners into a single strong learning model
Weak learners
Models with low prediction accuracy, similar to random guessing
Strong learners
Models with high(er) prediction accuracy
hierarchal clustering
Unsupervised learning method for clustering data observations. Clusters are built by measuring dissimilarity between data
K means clustering
Divides observations into K number of clusters, each observation being assigned to the closest centroid of a cluster
Ridge regression
Method of estimating the coefficients of multiple regression models in scenarios where predictors are highly correlated.
Shrinks coefficients but does not force it to 0, and therefore does not perform variable selection.
Lasso
Regression analysis method that performs both variable selection and regularisation to enhance prediction accuracy and interpretability
Regularisation
A set of methods for overfitted models to increase generalisability