CH1: The Machine Learning Landscape Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is machine learning?

A
  1. Machine Learning is the science (and art) of programming computers so they can learn from data
  2. field of study that gives computers the ability to learn without being explicitly programmed.
  3. A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves
    with experience E.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a training set?

A

The examples that the system uses to learn are called the training set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a training instance/ sample

A

Each training example is called a training instance (or sample).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why use ML (machine learning)

A
  1. Problems for which existing solutions require a lot of hand-tuning or long lists of rules: one Machine Learning algorithm can often simplify code and perform bet‐ ter.
  2. Complex problems for which there is no good solution at all using a traditional approach: the best Machine Learning techniques can find a solution.

3 Fluctuating environments: a Machine Learning system can adapt to new data.

  1. Getting insights about complex problems and large amounts of data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the 3 different categories to classify the diffferent types of ML?

A
  1. Whether or not they are trained with human supervision (supervised, unsuper‐ vised, semisupervised, and Reinforcement Learning)
  2. Whether or not they can learn incrementally on the fly (online versus batch learning)
  3. Whether they work by simply comparing new data points to known data points, or instead detect patterns in the training data and build a predictive model, much
    like scientists do (instance-based versus model-based learning)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is supervised learning?

A

In supervised learning, the training data you feed to the algorithm includes the desired solutions, called labels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is classification?

A

it is trained with many example emails along with their class (spam or ham), and it must learn how to classify new emails.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the typical tasks for supervised learning?

A
  1. A typical supervised learning task is classification.
  2. regression
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is regression?

A

Another typical task is to predict a target numeric value, such as the price of a car, given a set of features (mileage, age, brand, etc.) called predictors. This sort of task is called regression (Figure 1-6).1 To train the system, you need to give it many examples
of cars, including both their predictors and their labels (i.e., their prices).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the difference between an attribute and a feature?

A

In Machine Learning an attribute is a data type (e.g., “Mileage”), while a feature has several meanings depending on the context, but generally means an attribute plus its value (e.g., “Mileage =
15,000”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Can some regression algorithms be used for classification, give an example?

A

Note that some regression algorithms can be used for classification as well, and vice versa. For example, Logistic Regression is commonly used for classification, as it can
output a value that corresponds to the probability of belonging to a given class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the 6 most important supervised learning algorithms?

A
  • k-Nearest Neighbors
  • Linear Regression
  • Logistic Regression
  • Support Vector Machines (SVMs)
  • Decision Trees and Random Forests
  • Neural networks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is unsupervised learning>

A

In unsupervised learning, as you might guess, the training data is unlabeled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the most important unsupervised learning algorithms?

A
  • Clustering
    —K-Means
    —DBSCAN
    — Hierarchical Cluster Analysis (HCA)
  • Anomaly detection and novelty detection
    —One-class SVM
    — Isolation Forest
  • Visualization and dimensionality reduction
    — Principal Component Analysis (PCA)
    —Kernel PCA
    — Locally-Linear Embedding (LLE)
    — t-distributed Stochastic Neighbor Embedding (t-SNE)
  • Association rule learning
    —Apriori
    — Eclat
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are hierarchical clustering algorithms?

A

If you use a hierarchical clustering algorithm, it may also subdivide each group into smaller
groups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the uses of visualization algorithms?

A

Visualization algorithms are also good examples of unsupervised learning algorithms: you feed them a lot of complex and unlabeled data, and they output a 2D or 3D rep‐
resentation of your data that can easily be plotted

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the tasks of unsupervised learning>

A
  1. dimensionality reduction
  2. anomaly detection

3.novelty detection

  1. association rule learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is dimensionality reduction?

A

A related task is dimensionality reduction, in which the goal is to simplify the data without losing too much information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is feature extraction?

A

One way to do this is to merge several correla‐ ted features into one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Why should you reduce the dimension of your training data?

A

It is often a good idea to try to reduce the dimension of your train‐ ing data using a dimensionality reduction algorithm before you feed it to another Machine Learning algorithm (such as a super‐ vised learning algorithm). It will run much faster, the data will take up less disk and memory space, and in some cases it may also per‐
form better.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is anomaly detection and an example?

A

et another important unsupervised task is anomaly detection—for example, detect‐ ing unusual credit card transactions to prevent fraud, catching manufacturing defects, or automatically removing outliers from a dataset before feeding it to another learn‐ ing algorithm. The system is shown mostly normal instances during training, so it
learns to recognize them and when it sees a new instance it can tell whether it looks ike a normal one or whether it is likely an anomaly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the difference between novelty detection and anomaly detection?

A

the difference is that novelty detection algorithms expect to see only normal data during training, while anomaly detection algorithms are usually
more tolerant, they can often perform well even with a small percentage of outliers in the training set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is association rule learning?

A

another common unsupervised task is association rule learning, in which the goal is to dig into large amounts of data and discover interesting relations between
attributes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is semisupervised learning>?

A

Some algorithms can deal with partially labeled training data, usually a lot of unlabeled data and a little bit of labeled data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is a example of semisupervised learning algorithms?

A

DBNs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What are DBNs based on?

A

Most semisupervised learning algorithms are combinations of unsupervised and supervised algorithms. For example, deep belief networks (DBNs) are based on unsu‐ pervised components called restricted Boltzmann machines (RBMs) stacked on top of
one another.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How are the RBMs trained?

A

RBMs are trained sequentially in an unsupervised manner, and then the whole system is fine-tuned using supervised learning techniques.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is reinforcement learning

A

The learning system, called an agent in this context, can observe the environment, select and perform actions, and get rewards in return (or penalties in the form of negative rewards, as in Figure 1-12). It must then learn by itself what is the best strategy, called a policy, to get the most
reward over time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What does the policy define?

A

A policy defines what action the agent should choose when it is in a given situation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What are examples of Reinforcement learning

A

For example, many robots implement Reinforcement Learning algorithms to learn how to walk. DeepMind’s AlphaGo program is also a good example of Reinforcement
Learning:
It learned its winning policy by analyzing millions of games, and then playing many games against itself. Note that learning was turned off during the
games against the champion; AlphaGo was just applying the policy it had learned.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What is another criterion to classify the ML systems?

A

Another criterion used to classify Machine Learning systems is whether or not the system can learn incrementally from a stream of incoming data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is batch learning?

A

In batch learning, the system is incapable of learning incrementally: it must be trained using all the available data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is offline learning?

A

This will generally take a lot of time and computing resources, so it is typically done offline. First the system is trained, and then it is launched into production and runs without learning anymore; it just applies what it
has learned. This is called offline learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What are the disadvantages of offline learning?

A

If you want a batch learning system to know about new data you need to train a new version of the system from scratch on the full dataset then stop the old system and replace it with the new one

This solution is simple and often works fine, but training using the full set of data can take many hours,

Also, training on the full set of data requires a lot of computing resources (CPU, memory space, disk space, disk I/O, network I/O, etc.). If you have a lot of data and you automate your system to train from scratch every day, it will end up costing you a lot of money. If the amount of data is huge, it may even be impossible to use a batch
learning algorithm.

Finally, if your system needs to be able to learn autonomously and it has limited resources then carrying around large amounts of training data and taking up a lot of resources to train for hours
every day is a showstopper.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is online learning?

A

In online learning, you train the system incrementally by feeding it data instances sequentially, either individually or by small groups called mini-batches. Each learning
step is fast and cheap, so the system can learn about new data on the fly, as it arrives

36
Q

Why is online learning filled with advantages?

A

Online learning is great for systems that receive data as a continuous flow (e.g., stock prices) and need to adapt to change rapidly or autonomously. It is also a good option if you have limited computing resources: once an online learning system has learned about new data instances, it does not need them anymore, so you can discard them (unless you want to be able to roll back to a previous state and “replay” the data). This
can save a huge amount of space.

37
Q

What is out-of-core learning?

A

Online learning algorithms can also be used to train systems on huge datasets that cannot fit in one machine’s main memory (this is called out-of-core learning). The algorithm loads part of the data, runs a training step on that data, and repeats the
process until it has run on all of the data

38
Q

How can online learning be a confusing name?

A

Out-of-core learning is usually done offline (i.e., not on the live system), so online learning can be a confusing name. Think of it as
incremental learning.

39
Q

What is the learning rate?

A

One important parameter of online learning systems is how fast they should adapt to changing data: this is called the learning rate.

40
Q

What does a high learning rate mean>

A

If you set a high learning rate, then your system will rapidly adapt to new data, but it will also tend to quickly forget the old
data

41
Q

What does a low learning rate mean?

A

if you set a low learning rate, the system will have more inertia; that is, it will learn more slowly, but it will also be less sensitive to noise in the new data or to
sequences of nonrepresentative data points (outliers).

42
Q

What is a big challenge with online learning

A

A big challenge with online learning is that if bad data is fed to the system, the sys‐ tem’s performance will gradually decline. If we are talking about a live system, your
clients will notice.

43
Q

What do you need to do to reduce the risk of performance declination?

A

To reduce this risk, you need to monitor your system closely and promptly switch learning off (and possibly revert to a previously working state) if you detect a drop in performance. You may also want to monitor the input data and react to
abnormal data (e.g., using an anomaly detection algorithm).

44
Q

What is the categorization of ML of how these systems generalize?

A

Most Machine Learning tasks are about making predictions. This means that given a number of training examples, the system needs to be able to generalize to examples it has never seen before. Having a good performance measure on the training data is
good, but insufficient; the true goal is to perform well on new instances.

45
Q

What is instance based learning?

A

this is called instance-based learning: the system learns the examples by heart, then generalizes to new cases by comparing them to the learned examples (or a subset of
them), using a similarity measure.

46
Q

What is model-based learning?

A

Another way to generalize from a set of examples is to build a model of these exam‐ ples, then use that model to make predictions. This is called model-based learning

47
Q

What is model selection?

A

model selection: you selected a linear model of life satisfac‐ tion with just one attribute

48
Q

How can you know which values will make your model perform best?

A

To answer this question, you need to specify a performance measure. You can either define a utility function (or fitness function) that measures how good your model is, or you can define
a cost function that measures how bad it is.

49
Q

What kind of functions do people use for linear regression problems?

A

For linear regression problems, people typically use a cost function that measures the distance between the linear model’s
predictions and the training examples; the objective is to minimize this distance.

50
Q

What is does, training the model mean?

A

This is where the Linear Regression algorithm comes in: you feed it your training examples and it finds the parameters that make the linear model fit best to your data.
This is called training the model.

51
Q

What is inference?

A

You trained it on the training data (i.e., the learning algorithm searched for the model parameter values that minimize a cost function)

52
Q

What does a typical ML project look like?

A

In summary:

  • You studied the data.
  • You selected a model.
  • You trained it on the training data (i.e., the learning algorithm searched for the model parameter values that minimize a cost function).
  • Finally, you applied the model to make predictions on new cases (this is called inference), hoping that this model will generalize well.
53
Q

What are the two main things that could go wrong with ML>?

A

In short, since your main task is to select a learning algorithm and train it on some data, the two things that can go wrong are “bad algorithm” and “bad data.

54
Q

What are examples of bad data>

A
  1. Insufficient Quantity of Training Data
  2. nonrepresentative training data
  3. poor-quality data
  4. irrelevant features
55
Q

What are examples of bad algorithm>

A
  1. overfitting the training data
  2. underfitting the triaing data
56
Q

What is the main message from the paper: The unreasonable effectiveness of data?

A

The idea that data matters more than algorithms for complex problems

57
Q

How could data be generalized well, what is a crucial point to receive that?

A

In order to generalize well, it is crucial that your training data be representative of the new cases you want to generalize to. This is true whether you use instance-based
learning or model-based learning.

58
Q

What is sampling noise?

A

if the sample is too small, you will have sampling noise (i.e., nonrepresentative data as a result of chance)

59
Q

What is sampling bias?

A

very large samples can be nonrepresentative if the sampling method is flawed. This is called
sampling bias

60
Q

What is nonresponse bias?

A

Nonresponse bias occurs when survey participants are unwilling or unable to respond to a survey question or an entire survey

61
Q

What do is understood for poor-quality data?

A

Obviously, if your training data is full of errors, outliers, and noise (e.g., due to poorquality measurements)W

62
Q

What is feature engineering

A

A critical part of the success of a Machine Learning project is coming up with a good set of features to train on

63
Q

What dare the process steps involved in feature engineering?

A
  • Feature selection: selecting the most useful features to train on among existing features.
  • Feature extraction: combining existing features to produce a more useful one (as we saw earlier, dimensionality reduction algorithms can help).
  • Creating new features by gathering new data.
64
Q

What is overfitting>?

A

it means that the model performs well on the training data, but it does not generalize well

65
Q

When does overfitting happen?

A

Overfitting happens when the model is too complex relative to the amount and noisiness of the training data.

66
Q

What are possible solutions to overfitting/

A
  • To simplify the model by selecting one with fewer parameters (e.g., a linear model rather than a high-degree polynomial model), by reducing the number of attributes in the training data or by constraining the model
  • To gather more training data
  • To reduce the noise in the training data (e.g., fix data errors
    and remove outliers)
67
Q

What is regularization>?

A

Constraining a model to make it simpler and reduce the risk of overfitting is called regularization.

68
Q

What does it mean if you have two degrees of freedom?

A

For example, the linear model we defined earlier has two parameters, θ0
and θ1 . This gives the learning algorithm two degrees of freedom to adapt the model
to the training data: it can tweak both the height (θ0 we forced θ1) and the slope (θ1) of the line.

69
Q

What is the balance that need to be maintained concerning the model?

A

You want to find the right balance between fitting the training data perfectly and keeping the
model simple enough to ensure that it will generalize well.

70
Q

What is a hyperparameter?

A

A hyperparameter is a parameter of a learning algorithm (not of the model). As such, it is not affected by the learning algorithm itself; it must be set prior
to training and remains constant during training.

71
Q

What does a hyperparameter do?

A

The amount of regularization to apply during learning can be controlled by a hyper‐ parameter

72
Q

What happens when you set the regularization hyperparameter to a very large value?

A

If you set the regularization hyper‐ parameter to a very large value, you will get an almost flat model (a slope close to zero); the learning algorithm will almost certainly not overfit the training data, but it
will be less likely to find a good solution.

73
Q

What does underfitting mean>

A

it occurs when your model is too simple to learn the underlying structure of the data.

74
Q

What are the main options to fix underfitting?

A
  • Selecting a more powerful model, with more parameters
  • Feeding better features to the learning algorithm (feature engineering)
  • Reducing the constraints on the model (e.g., reducing the regularization hyper‐
    parameter)
75
Q

How can you know how well your model generalizes / to evaluate the model?

A

The only way to know how well a model will generalize to new cases is to actually try it out on new cases. \this is done by spplitting the data into a training and test set

76
Q

What is the generalization error/ out-of-sample error?

A

The error rate on new cases is called the generalization error (or out-ofsample error), and by evaluating your model on the test set, you get an estimate of this error. This value tells you how well your model will perform on instances it has never
seen before.

77
Q

What does a high generalization eror and a low training error mean?

A

If the training error is low (i.e., your model makes few mistakes on the training set) but the generalization error is high, it means that your model is overfitting the train‐
ing data.

78
Q

How can you compare two different models?

A

One option is to train both and compare how well they generalize using the test set.

79
Q

How do you choose the value of the regularization hyperparameter?

A

One option is to train 100 different models using 100 different values for this hyperparameter. Suppose you find the best hyperparame‐
ter value that produces a model with the lowest generalization error, say just 5% error.

80
Q

What is validation set/ development set/ dev set?

A

holdout validation: you simply hold out part of the training set to evaluate several candidate models and select the best one. More specifically, you train multiple models with various hyperparameters on the reduced training set (i.e., the full training set minus the validation set), and you select the model that performs best on the validation set. After this holdout vali‐ dation process, you train the best model on the full training set (including the valida‐ tion set), and this gives you the final model. Lastly, you evaluate this final model on
the test set to get an estimate of the generalization error.

81
Q

What happens when the validation set too high/ low?

A

However, if the validation set is too small, then model evaluations will be imprecise: you may end up selecting a suboptimal model by mistake. Conversely, if the validation set is too large, then the remaining training set
will be much smaller than the full training set.

82
Q

What is cross-validation?

A

One way to solve this problem is to perform repeated cross-validation, using many small validation sets. Each model is evaluated once per validation set, after it is trained on the rest of the data. By averaging out all
the evaluations of a model, we get a much more accurate measure of its performance.

83
Q

What is a drawback of cross-validation?

A

the training time is multiplied by the number of valida‐ tion sets

84
Q

What is data mismatch?

A

the most important rule to remember is that the validation set and the test must be as representative as possible
of the data you expect to use in production,

85
Q

What is the train-dev set?

A

One sol‐ ution is to hold out part of the training pictures (from the web) in yet another set that Andrew Ng calls the train-dev set. After the model is trained (on the training set, not on the train-dev set), you can evaluate it on the train-dev set: if it performs well, then the model is not overfitting the training set, so if performs poorly on the validation
set, the problem must come from the data mismatch.

Conversely, if the model performs poorly on the train-dev set, then the model must have overfit the training set, so you should try to simplify or regularize the model, get more training
data and clean up the training data

86
Q

What is No Free Lunch Theorem?

A

if you make absolutely no assumption about the data, then there is no reason to prefer one model over any
other

There is no model that is a priori guaranteed to work better (hence the name of the theorem). The
only way to know for sure which model is best is to evaluate them all.

Since this is not possible, in practice you make some reasonable assumptions about the data and you
evaluate only a few reasonable models