ML Models Flashcards

1
Q

Cons of Logistic Regression

A
  • Non-linear problems can’t be solved with LR (only produces a linear decision boundary).
  • Can’t capture feature interactions when the value of one feature influences the value of another (between user, ad, and publisher for instance)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Pros of Logistic Regression

A

Pros:
- Easy to implement
- Easy to train
- Fast inference
- Interpretable
- Often useful as a baseline model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Logistic Regression

A
  • Models the probability of a binary outcome using a weighted linear combination of features.
  • Work well when the data is linearly separable
  • Not a good choice for ad click prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Gradient-boosted decision trees

A

Pros
- Interpretable and easy to understand
- Can be used for feature selection (importance) and feature extraction

Cons
- Inefficient for continual learning. Not designed to be fine-tuned with new data. Usually need to retrain the model from scratch.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Two-tower neural network

A
  • Generate user embeddings for user features
  • Generate ad embeddings for ad features
  • The similarity between the user and ad embeddings is used to calculate relevance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Challenges of ad click prediction

A
  • Feature space is large and sparse. most features are filled with zeros.
  • Difficult to capture pairwise interactions
  • Continuous retraining
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Deep & Cross Network

A
  • Can replace manual feature cross method
  • Deep network: Learns complex generalizable features using DNN arch
  • Cross network: Automatically captures feature interactions and learns good feature crosses
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Factorization machines

A
  • Efficiently captures pairwise interactions between features
  • Improves logistic regression
  • Useful for ad click prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How do factorization machines work?

A

Learns an embedding vector for each feature. The interaction between two features is the dot product of their embeddings

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Support Vector Machines

A

Kind of like logistic regression in multi-dimensional space

  • Find a shape in n-dim space that classifies data points
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Learn To Rank?

A
  • Supervised machine learning to solve ranking problems
  • Given a query and a list of items, determine the optimal ordering of the items from most relevant to least relevant
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the types of Learn to Rank?

A
  • Pointwise
  • Pairwise
  • Listwise
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Point-wise Learn to Rank

A
  • The score of each item is predicted independently of the other items
  • The final ranking is achieved by sorting the predicted relevance scores
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Pair-wise Learn to Rank

A
  • Given a query and two items, predicts which item is more relevant to the query
  • Examples: RankNet, LambdaRank, LambdaMART
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

List-wise Learn to Rank

A
  • Given a query and a list of items, predict the optimal ordering of an entire list
  • Examples: SoftRank, ListNet, and AdaRank
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Tradeoffs for Learn to Rank Approaches?

A
  • Pairwise and listwise approaches produce more accurate results but they are more difficult to implement and train
17
Q

Pros of Decision Trees

A
  • Fast training
  • Fast inference
  • Minimal data prep (doesn’t require normalization or scaling) since the algorithm doesn’t depend on the distributions of input features
  • Interpretable
18
Q

Cons of Decision Trees

A
  • Over-fitting. Decision trees and very sensitive to small variations in data. A small change in input may lead to different outcomes at serving time. Small changes in training data can produce a different tree structure.
  • Too sensitive. Predictions are less reliable. Naive decision trees are rarely used in practice.
19
Q

Techniques to reduce the sensitivity of decision trees

A

Bagging
Boosting

20
Q

Bagging (Decision Trees)

A
  • Ensemble learning method that trains a set of ML models in parallel on multiple subsets of training data
  • Predictions of all these trained models are combined to make a final prediction
  • Reduces the model sensitivity
21
Q

Random Forest

A
  • Builds multiple decision trees in parallel during training
  • A voting mechanism is used to combine the predictions to make a final prediction
  • Example of bagging a decision tree
22
Q

Advantages of Bagging

A

-Reduces the effect of over-fitting (high variance)
- Doesn’t increase training time very much because the decision trees can be trained in parallel
- Does not add much latency at the inference time because decision trees can process the input in parallel

23
Q

Disadvantages of Bagging

A
  • Not helpful when the model faces under-fitting (high bias)
  • Need boosting for that
24
Q

Linear Regression vs Logistic Regression

A
  • Linear regression is used to estimate the dependent variable in case of a change in independent variables. For example, predict the price of houses.
  • Logistic regression is used to calculate the probability of an event. For example, classify if tissue is benign or malignant.
25
Q

Boosting

A
  • Ensemble learning technique to improve the performance of weak learners by combining predictions.
  • The final model is a weighted combination of the weak learners
26
Q

Bias

A

Stubbornness of the algorithm when confronted with new data

27
Q

Deep Factorization Machines

A
  • Combines the strengths of a NN and FM
  • The DNN captures higher order features
  • The FM captures low-level pairwise interactions