ML Models Flashcards
Cons of Logistic Regression
- Non-linear problems can’t be solved with LR (only produces a linear decision boundary).
- Can’t capture feature interactions when the value of one feature influences the value of another (between user, ad, and publisher for instance)
Pros of Logistic Regression
Pros:
- Easy to implement
- Easy to train
- Fast inference
- Interpretable
- Often useful as a baseline model
Logistic Regression
- Models the probability of a binary outcome using a weighted linear combination of features.
- Work well when the data is linearly separable
- Not a good choice for ad click prediction
Gradient-boosted decision trees
Pros
- Interpretable and easy to understand
- Can be used for feature selection (importance) and feature extraction
Cons
- Inefficient for continual learning. Not designed to be fine-tuned with new data. Usually need to retrain the model from scratch.
Two-tower neural network
- Generate user embeddings for user features
- Generate ad embeddings for ad features
- The similarity between the user and ad embeddings is used to calculate relevance
Challenges of ad click prediction
- Feature space is large and sparse. most features are filled with zeros.
- Difficult to capture pairwise interactions
- Continuous retraining
Deep & Cross Network
- Can replace manual feature cross method
- Deep network: Learns complex generalizable features using DNN arch
- Cross network: Automatically captures feature interactions and learns good feature crosses
Factorization machines
- Efficiently captures pairwise interactions between features
- Improves logistic regression
- Useful for ad click prediction
How do factorization machines work?
Learns an embedding vector for each feature. The interaction between two features is the dot product of their embeddings
Support Vector Machines
Kind of like logistic regression in multi-dimensional space
- Find a shape in n-dim space that classifies data points
What is Learn To Rank?
- Supervised machine learning to solve ranking problems
- Given a query and a list of items, determine the optimal ordering of the items from most relevant to least relevant
What are the types of Learn to Rank?
- Pointwise
- Pairwise
- Listwise
Point-wise Learn to Rank
- The score of each item is predicted independently of the other items
- The final ranking is achieved by sorting the predicted relevance scores
Pair-wise Learn to Rank
- Given a query and two items, predicts which item is more relevant to the query
- Examples: RankNet, LambdaRank, LambdaMART
List-wise Learn to Rank
- Given a query and a list of items, predict the optimal ordering of an entire list
- Examples: SoftRank, ListNet, and AdaRank
Tradeoffs for Learn to Rank Approaches?
- Pairwise and listwise approaches produce more accurate results but they are more difficult to implement and train
Pros of Decision Trees
- Fast training
- Fast inference
- Minimal data prep (doesn’t require normalization or scaling) since the algorithm doesn’t depend on the distributions of input features
- Interpretable
Cons of Decision Trees
- Over-fitting. Decision trees and very sensitive to small variations in data. A small change in input may lead to different outcomes at serving time. Small changes in training data can produce a different tree structure.
- Too sensitive. Predictions are less reliable. Naive decision trees are rarely used in practice.
Techniques to reduce the sensitivity of decision trees
Bagging
Boosting
Bagging (Decision Trees)
- Ensemble learning method that trains a set of ML models in parallel on multiple subsets of training data
- Predictions of all these trained models are combined to make a final prediction
- Reduces the model sensitivity
Random Forest
- Builds multiple decision trees in parallel during training
- A voting mechanism is used to combine the predictions to make a final prediction
- Example of bagging a decision tree
Advantages of Bagging
-Reduces the effect of over-fitting (high variance)
- Doesn’t increase training time very much because the decision trees can be trained in parallel
- Does not add much latency at the inference time because decision trees can process the input in parallel
Disadvantages of Bagging
- Not helpful when the model faces under-fitting (high bias)
- Need boosting for that
Linear Regression vs Logistic Regression
- Linear regression is used to estimate the dependent variable in case of a change in independent variables. For example, predict the price of houses.
- Logistic regression is used to calculate the probability of an event. For example, classify if tissue is benign or malignant.
Boosting
- Ensemble learning technique to improve the performance of weak learners by combining predictions.
- The final model is a weighted combination of the weak learners
Bias
Stubbornness of the algorithm when confronted with new data
Deep Factorization Machines
- Combines the strengths of a NN and FM
- The DNN captures higher order features
- The FM captures low-level pairwise interactions