Machine Learning Flashcards
Most common cost functions for linear regression
MSE - Mean Squared Error (or OLS - Ordinary Least Squares), MAE - Mean Absolute Error, Huber Loss Function
MSE
Mean squared error:
- A measure of the quality of a model/estimator
- The average squared difference between the estimated values and the actual value.
- It is the “second moment” (about the origin) (L2) of the error, and thus incorporates both the variance of the estimator (how widely spread the estimates are from one data sample to another) and its bias (how far off the average estimated value is from the truth).
Cost function for Logistic Regression
Log loss or cross-entropy
Sigmoid function
1 / (1 + e^-z)
Logistic Regression
A classification algorithm used to assign probabilities to a discrete set of classes.
Monotonic function
Always increasing or always decreasing
Softmax function
Used to normalize results in multi-class logistic regression. Transforms a vector of predictions (real numbers) so that each is in the interval of [0, 1] and all add up to 1 so they can be interpreted as probabilities. (aka "normalized exponential function" or softargmax)
Popular regression algorithms
- Ordinary Least Squares Regression (OLSR)
- Linear Regression
- Logistic Regression
- Stepwise Regression
- Multivariate Adaptive Regression Splines (MARS)
- Locally Estimated Scatterplot Smoothing (LOESS)
Popular instance-based algorithms
- k-Nearest Neighbors (kNN)
- Learning Vector Quantization (LVQ) (is also neural-network-inspired)
- Self-Organizing Map (MAP)
- Locally Weighted Learning (LWL)
- Support Vector Machine (SVM)
Popular regularization algorithms
- Ridge Regression
- Least Absolute Shrinkage and Selection Operator (LASSO)
- Elastic Net
- Least-Angle Regression
Popular decision tree algorithms
- Classification and Regression Tree (CART)
- Iterative Dichotomiser 3 (ID3)
- C4.5 and C5.0 (diff versions of a powerful approach)
- Chi-squared Automatic Interaction Detection (CHAID)
- Decision Stump
- M5
- Conditional Decision Trees
Popular Bayesian algorithms
- Naive Bayes
- Gaussian Naive Bayes
- Multinomial Naive Bayes
- Averaged One-Dependence Estimators (AODE)
- Bayesian Belief Network (BBN)
- Bayesian Network (BN)
Popular clustering algorithms
- k-Means
- k-Medians
- Expectation Maximization (EM)
- Hierarchical Clustering
Popular association rule learning algorithms
- Apriori algorithm
- Eclat algorithm
Popular ensemble algorithms
- Boosting
- Bootstrapped Aggregation (Bagging)
- AdaBoost
- Weighted Average (Blending)
- Stacked Generalization (Stacking)
- Gradient Boosting Machines (GBM)
- Gradient Boosted Regression Trees (GBRT)
- Random Forest
Popular ensemble algorithms
- Boosting
- Bootstrapped Aggregation (Bagging)
- AdaBoost
- Weighted Average (Blending)
- Stacked Generalization (Stacking)
- Gradient Boosting Machines (GBM)
- Gradient Boosted Regression Trees (GBRT)
- Random Forest
Describe regression algorithms
TODO
Data cleaning/prep checklist
- Dups rows or values?
- Missing values? What strategy to use to handle it?
- Does any data need recoded?
- Does any data need transformed from categorical to dummy variables?
Data exploration checklist
- Summary statistics
- Correlations
- Subsets
Machine learning DevOps challenges
- High heterogeneity
- ## High composability
Machine learning DevOps challenges
- High heterogeneity
- High composability
- More options for performance and success metrics
- Iteration - models may require frequent retraining and redeployment
- Infrastructure - varied and dynamic loads, evolving ecosystem
- Scalability from unpredictable loads & high performance demands
- Auditability - Need to explain the “black box”
Properties of standardized data set
- mean of zero
- unit variance (std dev = 1)
- normal (Gaussian) distribution [usually]
Gini Impurity formula for a node
IG = 1 - (probability of condition one)^2 - (probability of condition two)^2
Gini Impurity formula for a condition
= weighted avg of Gini Impurity for leaf nodes
= SUM for all nodes[ (% of items classified by node) * (IG of node) ]