Modeling - ML Models Flashcards
XGBoost
- Extreme Gradient Boosted Trees
- Boosted group of decision trees
- new trees made to correct errors of previous trees
- uses gradient descent to minimize loss as new trees are added
- Classification or regression (using regression trees)
- regularization term penalizes complexity of each tree
- nodes are split if there is a positive reduction of the loss function
- loss reduction (gamma) is used to control complexity costs with each additional leaf
Logistic Regression
Nonlinear Classification Model
Probabilities describe possible outcomes when modeled with logistic function
K-means
- method for grouping n observations into K clusters
- each observation belongs to the cluster with the nearest mean
Unsupervised
Linear Regression
Supervised
Regression Model
SVM
- Supervised learning models for classification or regression
- finds a hyperplane in N-dimensional space that distinctly classifies the datapoints
- If classes can’t be separated with a single line, you need a non-linear kernal to create hyperplane
Decision Trees
- flowchart like structure in which each internal node represents a test on an attribute and each leaf node represents a class label
- paths from root to leaf represent classification rules
Random Forest
- ensemble of decision tree classifiers
- each tree is generated from independent random vectors from samples in dataset
- tree classifiers are then combined by averaging probabilistic predictions
RNN
- Recurrent neural network
- connections between nodes can create a cycle, allowing output from some nodes to affect subsequent inputs to same nodes
- Infinite impulse response class of networks
- linear time-invariant systems
- h(t) does not become exactly zero past a certain point, continues indefinitely
CNN
- most commonly for visual images
- uses convolution kernels that map a high dimension dataset to a lower dimension dataset
- finite impulse response class of networks
- impulse response does become exactly zero at times t > T for some finite T
Collaborative Filtering
- Technique for recommender systems
- make auto predictions about interests of a user by collecting preferences or taste information from many users (collaborating)
- if person A has same opinion as person B on an issue, A is more likely to have B’s opinion on a different issue than of a randomly chosen person
Semantic Segmentation
- deep learning algorithm that associates a label or category with every pixel in an image
- used to recognize a collection of pixels that form distinct categories
- try to draw a boundary around every object and know pixel level details
- labeling every pixel in image and knowing to which class it belongs
Instance Segmentation
Segment and show different instances of same class
Linear Learner
SageMaker Built in algorithm
supervised learning algorithms used for classification or regression
For regression - basically Linear Regression.
For classification - linear threshold function is used. Can do binary or multi-class.
Uses Stochastic Gradient descent
DeepAR
Sagemaker built in algorithm
Forecasting algorithm
Forecasting scalar time series using RNN.
Random Cut Forest
For anomaly detection
Unsupervised
Can detect unexpected spikes in time series data