MLML Flashcards

1
Q

What is supervised learning?

A

In supervised learning, algorithms learn from training data where the desired solutions, called labels, are included, and the goal is to make predictions on future data based on these examples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Two different voting schemes are common among voting classifiers. What are they?
Briefly explain how they work.

A
  • In hard voting (also known as majority voting), every individual classifier votes for a class, and the majority wins.
  • In soft voting, every individual classifier provides a probability value that a specific data point belongs to a particular target class. The predictions are weighted by the classifier’s importance and summed up. Then the target label with the greatest sum of weighted probabilities wins the vote
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What issue arises when models are automatically trained on data collected during production?

Staleness
Data skews
Feedback loops

A

Feedback loops

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Do different gradient descent methods always converge to similar points?

True
False

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What kind of problems does regularization solve? Give an example of a regularization technique.

A

Regularization is any modification we make to a learning algorithm that is
intended to reduce its generalization error but not its training error.
Ridge Regression, Lasso Regression, and Elastic Net implement three different ways to constrain the weights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the difference between a validation set and a test set? What roles do they play?

A

Validation data is used to select the model and hyperparameters, in other words, it provides a performance estimate during model construction and model selection. The validation data can for example be 10-20% of the training set. However, this depends on the size and other characteristics of the dataset. When you have fine-tuned your model, you train a final model on the entire training set (training data + validation data) before predicting on test data.

  • Test data: Used to evaluate the generalization performance of the selected model on unseen data. A common rule-of-thumb for splitting data is 80/20 - where 80% of the data is used for training a model, while 20% is used for testing it. However, the percentage split will in practice vary a lot, depending on the size and heterogeneity of the relevant data set. Note, that it’s important to not touch this data until you have
    fine-tuned your model to get an unbiased evaluation. If you tune hyperparameters using the test set, you risk overfitting the test set, and the generalization error you measure will be optimistic (you may launch a model that performs worse than you expect)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What typically happens after a while in operation to an offline-trained model dealing with new real, live data?

The model becomes stale
The model adapts to new patterns
The model abruptly forgets all previously learned information

A

The model becomes stale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q
A

7

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which of the following is true of cross-validation?

Fits multiple models on different splits of the training data
Increases generalization ability and reduces computational complexity
Removes the need for training and test sets

A

Fits multiple models on different splits of the training data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How can you handle missing or corrupted data in a dataset? (Select all that apply)

Drop missing rows or columns
Replace missing values with mean/median
Replace missing values with the smallest/largest value

A

Drop missing rows or columns
Replace missing values with mean/median

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Suppose that you have a very accurate model for a social app that uses several features to predict whether a user is a spammer or not. You trained the model with a particular idea of what a spammer was, for example, a user who sends ten messages in one minute. Over time, the app grew and became more popular, but the outcome of the
predictions has drastically changed. As people are chatting and messaging more, now sending ten messages in a minute becomes normal and not something that only spammers do. What kind of drift causes this spam-detection model’s predictive ability to decay?

Data drift
Concept drift

A

Concept drift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which of the following are advantages to using decision trees over other models? (Select all that apply)

Trees are naturally resistant to overfitting
Trees often require less preprocessing of data
Trees are easy to interpret and visualize
Trees are robust to small changes in the data

A

Trees often require less preprocessing of data
Trees are easy to interpret and visualize

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Regarding bias and variance, which of the following statements are true? (Select all that apply)

Models which overfit have a high bias
Models which overfit have a low bias
Models which underfit have a high variance
Models which underfit have a low variance

A

Models which overfit have a low bias
Models which underfit have a low variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is stacking? Select the alternative that best characterizes stacking.

You use different versions of machine learning algorithms
You use several machine learning algorithms to boost your results
The predictions of one model become the inputs to another model
You stack your training set and test set togheter

A

The predictions of one model become the inputs to another model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the main advantage of using feature selection?

Speeding up the training of an algorithm
Fine-tuning the model’s performance
Remove noisy features

A

Speeding up the training of an algorithm

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Having restructured a new training dataset after detecting model decay, you must start over by resetting and completely retraining your model. There is no way around it.

True
False

A

False

17
Q

As a recently hired data scientist in a bank, you have been assigned the task to design a credit card fraud detection system using machine learning. How would you build, monitor, and maintain your system?

A

Cover the end-to-end machine learning project steps:
- Frame the problem and look at the big picture
- Get the data
- Explore the data to gain insights
- Prepare the data to better expose the underlying data patterns to machine learning algorithms
- Explore many different models and short-list the best ones
- Fine-tune your models and combine them into a great solution
- Present the solution (what assumptions were made, system’s limitations,
documentation, etc).
- Launch, monitor, and maintain your system