Machine Learning Textbook Flashcards

Question 1

Q

How would you define Machine Learning?

Answer

A

building systems that can learn from data. Learning means getting better at some task, given some performance measure.

Question 2

Q

Can you name four problems where Machine learning shines?

Answer

A

Machine learning is great for complex problems for which we have no algorithmic solution, to replace long lists of hand-tuned rules, to build systems that adapt to fluctuating environments, and finally to help humans learn(eg data mining)

Question 3

Q

What is a labeled training set?

Answer

A

raining set that contains the desired solution (aka label) for each instance.

Question 4

Q

What are the two most common supervised learning tasks?

Answer

A

regression and classification

Question 5

Q

Can you name four common unsupervised learning tasks?

Answer

A

clustering, visualization, dimensionality reduction, association rule learning.

Question 6

Q

What type of MLE algorithm would you use to walk in various terrains?

Answer

A

Reinforcement learning

Question 7

Q

What type of algorithm would you use to segment your customers into multiple groups?

Answer

A

not defined = clustering

defined = classification

Question 8

Q

Would you frame the problem of spam detection as a supervised learning problem or an unsupervised learning problem?

Answer

A

supervised learning problem: the algorithm is fed many emails ( spam not spam)

Question 9

Q

What is an online learning system?

Answer

A

the online learning system can learn incrementally

as a opposed to batch learning system.

This makes it capable of adapting rapidly to both changing data and autonomous systems and training on large quantities of data.

Question 10

Q

what is out-of-core learning?

Answer

A

handle vast quantities of data the cannot fit into a computer’s main memory. An out-of-core learning algorithm chops the data into mini-batches and uses online learning techniques to learn from these mini-batches.

Question 11

Q

What type of learning algorithm relies on similarity measures to make predictions?

Answer

A

learns the training data by heart; then, when given a new instance, it uses a similarity measure to find the most similar learned instances and uses them to make predictions.

Question 12

Q

What is the difference between a model parameter and a learning algorithm hyperparameter?

Answer

A

model has one or more parameters that determine
what it will predict given new instance (ie slope)

A hyperparameter is a parameter of the learning algorithmic self, not of the model. (eg the amount of regularization to apply)

Question 13

Q

What do model-based learning algorithms search for? What is the most common strategy they use to succeed?

Answer

A

model-based learning algorithms search for optimal value for the model parameters so the model will generalize to new instances.
We Train this by minimizing a cost function, plus a penalty for model complexity if it is regularized.

Question 14

Q

Can you name the four main challenges of machine learning?

Answer

A

the lack of data, poor data quality, nonrepresentative data, uninformative features, excessively simple models that underfit the training data, and excessively complex models that overfit the data.

Question 15

Q

If your model performs great on training data but generalizes poorly to instances, what is happening? Can you name three possible solutions?

Answer

A

the model is likely overfitting the training data. solutions: getting more data, simplifying the model (selecting a simpler algorithm, reducing the number of parameters or features used, or regularizing the model), or reducing the noise in the training data.

Question 16

Q

What is a test set and how should you use it?

Answer

A

used to estimate the generalization error that a model will make on new instances before the model is launched in production.

Question 17

Q

What is the purpose of a validation set?

Answer

A

A Validation set is used to compare models, It makes it possible to select the best model and tune the hyperparameters.

Question 18

Q

What is the train-dev set, when do you need it, and how do you use it?

Answer

A

The train dev set is used when there is a risk of mismatch between training data and the data used in the validation and test datasets( which should always be as close as possible to the data used once the model is in production). The train-dev set is part of the training set that’s held out (the model is not trained on it)The model is trained on the rest of the training set and, evaluated on both the tri-dev set and the validation set, then the model performs well on the training set but not on the train-dev set, then the model is likely overfitting the training set. If it performs well on both the training set and the train dev-set set but not on the validation set, then there is the probability of a significant data mismatch between the training data and the validation + test data and you should try to improve the training data to make it look more like the validation + data et.

Question 19

Q

What can go wrong if you tune hyperparameters using the test set?

Answer

A

If you tune hyperparameters using the test set, you risk overfitting the test set, and the generalization error you measure will be optimistic ( you may launch a model that performs worse than you expect.)