Google ML Vocabulary Flashcards

Question 1

Q

Google ML Vocabulary

Agent

Answer

A

In reinforcement learning, the entity that uses a policy to the determine which action will maximize the expected return gained from transitioning between states of the environment.

agent

Question 2

Q

Google ML Vocabulary

Action

Answer

A

In reinforcement learning, the mechanism by which the agent transitions between states of the environment. The agent chooses the action by using a policy.

action

Question 3

Q

Google ML Vocabulary

Bias (math) or bias term

Answer

A

An intercept or offset from an origin. Bias is a parameter in machine learning models, which is symbolised by either of the following:

b
w₀

For example, bias is the b in the following formula:

y' = b + w₁x₁ + w₂w₂ + ... wₙxₙ

In a simple two-dimensional line, bias just means “y-intercept”.

Bias exists because not all models start from the origin (0,0). For example, suppose an amusement park costs 2 Euros to enter and an additional 0.5 Euro for every hour a customer stays. Therefore, a model mapping the total cost has a bias of 2 because the lower cost is 2 Euros.

Bias is not to be confused with bias in ethics and fairness or prediction bias.

bias (math) or bias term

Question 4

Q

Google ML Vocabulary

Class

Answer

A

A category that a label can belong to. For example:

In a binary classification model that detects spam, the two classes might be spam (positive) and not spam (negative)
In a multi-class classification model that identifies dog breeds, the classes might be poodle, beagle, pug and so on.

A classification model predicts a class. In contrast, a regression model predicts a number.

class

Question 5

Q

Google ML Vocabulary

Classification Model

Answer

A

A model whose predication is a class. For example, the following are all classification models:

A model that predits an input sentences’ language (French? Spanish? Italian?).
A model that repdicts tree species (Maple? Oak? Ash?)
A model that predicts the positive or negative class for a particular medical condition

classification model

Question 6

Q

Google ML Vocabulary

Classification Model

Answer

A

In a binary classification, a number between 0 and 1 that converts the raw output of a logistic regression model into a prediction of either the positive class or the negative class. Note that the classification threshold is a value that a human chooses, not a value chosen by model training.

A logistic regression model outputs a raw value between 0 and 1. Then:

If this raw value is greater than the classification threshold, then the positive class is predicted.
If this raw value is less than the classification threshold, then the negative class is predicted.

For example, suppose the classification threshold is 0.8. If the raw value is 0.9, then the model predicts the positive class. If they raw value i 0.7, then the model predicts the negative class.

The choice of classification threshold strongly influences the number of false positives and false negatives.

classification threshold

Question 7

Q

Google ML Vocabulary

Clustering

Answer

A

Grouping related examples, particularly during unsupervised learning. Once all the examples are grouped, a human can optionally supply meaning to each cluster.

clustering

Question 8

Q

Google ML Vocabulary

Convergence

Answer

A

A state reached when loss values change very little or not at all with each iteration.

A model converges when additional training won’t improve the model.

In deep learning, loss values sometimes stay constant or nearly so for many iterations before finally descending. During a long period of constant loss values, you may temporarily get a false sense of convergence.

See also early stopping

convergence

Question 9

Q

Google ML Vocabulary

Empirical risk minimization (ERM)

Answer

A

Choosing the function that minimizes loss on the training set.

Contrast with structural risk minimization.

empirical risk minimization (ERM)

Question 10

Q

Google ML Vocabulary

Example

Answer

A

The values of one row of features and possibly a lable. Examples in supervised learning fall into two general categories:

A labeled example consists of one or more features and a label. Labeled Examples are used during training.
An unlabeled example consist of one or more features but no label. Unlabeled examples are used during inference.

For instance, suppose you are training a model to determine the influence of weather conditions on student test scores. This first data set contains three examples, each with three features (Temperature, Humidity and Pressure) and one label (Test Score):

Temperature, Humidity, Pressure, Test Score
15, 47, 998, 92
19, 34, 1020, 84
18, 92, 1012, 87

Here are the same “unlabeled” examples, that do not include the label Test Score value:

Temperature, Humidity, Pressure
15, 47, 998
19, 34, 1020
18, 92, 1012

The row of a dataset is typically the raw source for an example. That is, and example typically consists of a subset of the columns in the dataset. Furthermore, the features in an example can also include synthetic features, such as feature crosses.

example

Question 11

Q

Google ML Vocabulary

False Negative

Also referred to as FN

Answer

A

When a binary classification model mistakenly predicts the negative class. For example, the model predicts that a particular email message is not spam (the negative class), but that email message actually is spam.

false negative (FN)

Question 12

Q

Google ML Vocabulary

False Negative Rate

Answer

A

The proportion of actual positive examples for which the model mistakenly predicted the negative class. The following formula calculates the false negative rate:

false negative rate = (false negatives / (false negatives + true positives))

false negative rate

Question 13

Q

Google ML Vocabulary

False Positive

Answer

A

An example in which the model mistakenly predicts the positive class. For example, the model predicts that a particular email message is spam (the positive class), but that email message is actually no spam.

false positive (FP)

Question 14

Q

Google ML Vocabulary

False Positive Rate

Answer

A

The proportion of actual negative examples for which the model mistakenly predicted the positive clas. The following formula calculates the false positive rate:

false positive rate = (false positives / (false positives + true negatives))

The false positive rate is the x-axis in an ROC curve.

false positive rate

Question 15

Q

Google ML Vocabulary

Feature

Answer

A

An input variable to a machine learning model. An example consists of one or more features. For instance, supposed you are training a model to determine the influence of weather conditions on student test scores. The following table shows three examples, each of which contains three features (Temperature, Humidity and Pressure) and one label (Test Score).

Temperature, Humidity, Pressure, Test Score
15, 47, 998, 92
19, 34, 1020, 84
18, 92, 1012, 87

Contrast with label

feature

Question 16

Q

Google ML Vocabulary

Feature Cross

Answer

A

A synthetic feature formed by “crossing” categorical or bucketed features.

For example, consider a “mood forecasting” model that represents temperature in one of the following four buckets: freezing, chilly, temperate, warm

And represents wind speed in one of the following three buckets: still,light,windy.

Without feature crosses, the linear model trains independently on each of the preceding seven various buckets. So, the model train on for instance, freezing independently of the training on, for instance, windy.

Alternatively you could create a feature cross of temperature and wind speed. This synthetic feature would have the following 12 possible values:

freezing-still
freezing-light
freezing-windy
chilly-still
chilly-light
chilly-windy
temperate-still
temperate-light
temperate-windy
warm-still
warm-light
warm-windy

Thanks to feature crosses, the model can learn mood differences between a freezing-windy day and a freezing-still day.

Formally, a feature cross is a cartesian product.

Feature crosses are mosly used with linear models and are rarely used with neural networks.

feature cross

Question 17

Q

Google ML Vocabulary

Gradient Descent

Answer

A

A mathematical technique to minimize loss. Gradient descent iteratively adjusts weights and biases, gradually finding the best combination to minimize loss.

Gradient descent is older - much, much older - than machine learning.

gradient descent

Question 18

Q

Google ML Vocabulary

Hyperparameter

Answer

A

The variables that you or a hyperparameter tuning service adjust during successive runs of a training model. For example, learning rate is a hyperparameter. You could set the learning rate to 0.01 before one training session. If you determine that 09.01 is too high, you could perhaps set the learning rate to 0.003 for the next training session.

In contrast, parameters are the various weights and bias that the model learns during training.

hyperparameter

Question 19

Q

Google ML Vocabulary

Inference

Answer

A

In machine learning, the process of making predictions by applying a trained model to unlabeled examples.

Inference has a somewhat different meaning in statistics.

inference

Wikipedia - Statistical inference

Question 20

Q

Google ML Vocabulary

Iteration

Answer

A

A singleupdate of a model’s parameters - the model’s wieghts and biases - during training. The batch size determines how many examples the model process in a single iteration. For instance, if the batch size is 20, then the model process 20 examples before adjusting the parameters.

When training a nueral network, a single iteration involves the following two passes:

A forward pass to evaluate loss on a single batch.
A backward pass (backpropagation) to adjust the models’ parameters based on the loss and the learning rate.

iteration

Question 21

Q

Google ML Vocabulary

Negative Class

Answer

A

In binary classification, one class is termed positive and the other is termed negative. The positive class is the thing or event that the model is testing for and the negative class is the other possibility. For example:

The negative class in a medical test might “not tumor”.
The negative class in a n email classifier might be “not spam”.

Contrast with positive class.

negative class

Question 22

Q

Google ML Vocabulary

Positive Class

Answer

A

The class you are testing for.

For example, the positive class in a cancer model might be “tumor”. The positive class in an email classifier might be “spam”.

Contrast with negative class

positive class

Question 23

Q

Google ML Vocabulary

Label

Answer

A

In supervised machine learning, the “answer” or “result” portion of an example.

Each labeled example consists of one or more features and a label. For instance, in a spam detection dataset, the label would probably be either “spam” or “not spam”. In a rainfall dataset, the label might be the amount of rain that fell during a certain period.

label

Question 24

Q

Google ML Vocabulary

Labeled Example

Answer

A

An example that contains one or more features and a label. For example, the following table shows three labeled examples from a house valuation model, each with three features (Bedrooms, Bathrooms, Age) and one label (Price):

Bedrooms, Bathrooms, Age, Price
3, 2, 15, 345000
2, 1, 72, 179000
4. 2, 34, 392000

labeled example

Question 25

Q

Google ML Vocabulary

Learning Rate

Answer

A

A floating-point number that tells the gradient descent algorithm how strongly to adjust weights and biases on each iteration. For example, a learning rate of 0.3 would adjust weights and biases three times more powerfully than a learning rate of 0.1.

Learning rate is a key hyperparameter. If you set the learning rate too low, training will take too long. If you set the learning rate too high, gradient descent often has trouble reaching convergence.

During each iteration, the gradient descent algorithm multiplies the learning rate by the gradient. The resulting product i called the gradient step.

learning rate

Question 26

Q

Google ML Vocabulary

Linear

Answer

A

A relationship between two or more variables that can be represented solely through addition and multiplication.

The plot of a linear relationship is a line.

Contrast with nonlinear.

linear

Question 27

Q

Google ML Vocabulary

Linear Model

Answer

A

A model that assigns one weight per feature to make predictions. (linear models also incorporate a bias.) in contrast the relationship of features to predictions in deep models is generally nonliear.

Linear models are usually easier to train and more interpretable that deep models. However, deep models can learn complex relationships between features.

Linear regression and logistic regression are two types of linear models.

linear model

Question 28

Q

Google ML Vocabulary

Logistic Regression

Answer

A

A type of regression model that predicts a probability. Logistic regression models have the following characteristics:

The label is categorical. The term logistic regression usually refers to binary logistic regression, that is, to a model that calculates probabilities for labels with two possible values. A less common variant, multinomial logistic regression, calculates probabilities for labels with more than two possible values.
The loss function during training is Log Loss. (Multiple Log Loss units can be placed in parallel for labels with more than two possible values.)
The model has a linear architecture, not a deep neural network. However, the remainder of this definition also applies to deep models that predict probabilities for categorical labels.

For example, consider a logistic regression model that calculates the probability of an input email being either span or not spam. During inference, suppose the model predicts 0.72. Therefore, the model is estimating:

A 72% chance of the email being spam.
A 28% chance of the email not being spam.

A logistic regression model uses the following two-step architecture:

The model generates a raw prediction (y’) by applying a linear function of input features.
2 The model uses that raw prediction as input to a sigmoid function, which converts the raw prediction to a value between 0 and 1, exclusive.

Like any regression model, a logistic regression model predicts a number. However, this number typically becomes part of a binary classification model as follows:

If the predicted number is greater than the classification threshold, the binary classification model predicts the positive class.
If the predicted number is less that the classification threshold, the binary classification model predicts the negative class.

logistic regression

Question 29

Q

Google ML Vocabulary

Linear Regression

Answer

A

A type of machine learning model in which both of the following are true:

The model is a linear model
The prediction is a floating-point value. (This is the regression part of linear regression.)

Contrast linear regression with logistic regress. Also, contrast regression with classification.

linear regression

Question 30

Q

Google ML Vocabulary

Loss

Answer

A

During the training of a supervised model, a measure of how far a model’s prediction is from its label.

A loss function calculates the loss.

loss

Question 31

Q

Google ML Vocabulary

Structural risk minimization (SRM)

Answer

A

An algorithm that balances two goals:

The desire to build the most predictive model (for example, lowest loss).
The desire to keep he model as simple as possible (for example, strong regularization).

For example, a function that minimizes loss + regularization on the training set is a structural risk minimization algorithm.

Contras with empirical risk minimization.

structural risk minimization (SRM)

Question 32

Q

Google ML Vocabulary

Model

Answer

A

In general, any mathematical construct that process input data and returns output.

Phrased differently, a model is the set of parameters and structure needed for a system to make predictions.

model

Question 33

Q

Google ML Vocabulary

Synthetic Feature

Answer

A

A feature not present among the input features, but assembled from one or more of them. Methods for creating synthetic features include the following:

Bucketing a continuous feature into range bins.
Creating a feature cross
Multiplying (or dividing) one feature value by other feature value(s) or by itself. For example, if a and b are input features, then ab and a² examples of synthetic features
Applying a transcendental function to a feature value. For example, if c is an input feature, then sin(c) and ln(c) are examples of synthetic features.

Features created by normalizing or scaling alone are not considered synthetic features.

synthetic feature

Question 34

Q

Google ML Vocabulary

Nonlinear

Answer

A

A relationship between two or more variables that can’t be represented solely through addition and multiplication. A linear relationship can be represented as a line; a nonlinear relationship can’t be represented as a line.

nonlinear

Question 35

Q

Google ML Vocabulary

Parameter

Answer

A

The weights and biases that a model learns during training. For example, in a linear regression model, the parameters consist of the bias (b) and all the weights (w₁, w₂, …) in the following formula:

y' = b + w₁x₁ + w₂w₂ + ... wₙxₙ

In contrast, hyperparameters are the values that you (or a hyperparameter turning service) supply to the model. For example, a learning rate is a hyperparameter.

parameter

Question 36

Q

Google ML Vocabulary

Policy

Answer

A

In reinforcement learning, an agent’s probabilistic mapping from states to actions.

policy

Question 37

Q

Google ML Vocabulary

Prediction

Answer

A

A models’ output. For exmaple:

The prediction of a binary classification model is either the positive class or the negative class.
The prediction of a multi-class classification model is one class.
The prediction of a linear regression model is a number.

prediction

Question 38

Q

Google ML Vocabulary

Reinforcement Learning

Also referred to as RL

Answer

A

A family of algorithms that learn an optimal policy, whose goals is to maximize return when interacting with an environment.

For example, the ultimate reward of most games is victory. Reinforcement learning systems can become expert at playing complex games by evaluating sequences of previous game moves that ultimately led to wins and sequences that ultimately led to losses.

reinforcement learning (RL)

Question 39

Q

Google ML Vocabulary

Regression Model

Answer

A

Informally, a model that generates a numerical prediction. Examples:

A model that predicts a certain house’s value, such as 423,000 Euros.
A model that predicts a certain tree’s life expectancy, such as 23.2 years.
A model that predicts the amount of rain that will fall in a certain city of the next six hours, such as 0.18 inches.

Two common types of regression models are:

Linear regression, which finds the line that best fits label values to features.
Logistic regression, which generates a probability between 0.0 and 1.0 that a system typically then maps to a class prediction.

Not every model that outputs numerical predictions is a regression model, for example a model that predicts postal codes is a classification model not a regression model.

regression model

Question 40

Q

Google ML Vocabulary

Return

Answer

A

In reinforcement learning, given a certain policy and a certain state, the return is the sum of all rewards that the agent expects to receive when following the policy from the state to the end of the episode.

The agent accounts for the delayed nature of expected rewards by discounting rewards according to the state transitions required to obtain the reward.

return

Question 41

Q

Google ML Vocabulary

Reward

Answer

A

In reinforcement learning, the numerical result of taking an action in a state, as defined by the environment.

reward

Question 42

Q

Google ML Vocabulary

Sigmoid Function

Answer

A

A mathematical function that “squishes” an input value into a constrained range, typically 0 to 1 or -1 to +1. That is, you can pass any number (two, a million, negative billion, whatever) to a sigmoid and the output will still be in the constrained range.

Plots of sigmoid functions are “S” shaped.

sigmoid function

Question 43

Q

Google ML Vocabulary

State

Answer

A

In reinforcement learning, the parameter values that describe the current configuration o the environment, which the agent uses to choose an action.

state

Question 44

Q

Google ML Vocabulary

Supervised Machine Learning

Answer

A

Training a model from features and their corresponding labels. Supervised machine learning is analogous to learning a subject by studying a set of questions and their corresponding correct answers. After mastering the mapping between questions and answers, a student can then provide answers to new (never-before-seen) questions on the same topic.

Compare with unsupervised machine learning

supervised machine learning

Question 45

Q

Google ML Vocabulary

Training

Answer

A

The process of determining the ideal parameters (weights and biases) comprising a model. During training, a system reads in examples and gradually adjusts parameters. Training uses each example anywhere from a few times to billions of times.

training

Question 46

Q

Google ML Vocabulary

Unlabeled Example

Answer

A

An example that contains features but no label. For example, the following table shows three unlabeled examples from a house valuation model, each with three features (Bedrooms, Bathrooms, and Age) but no house value:

Bedrooms, Bathrooms, Age
3, 2, 15
2, 1, 72
4, 2, 34

In supervised machine learning, models train on labeled examples and make predictions on unlabeled examples.

In semi-supervised and unsupervised learning, unlabeled examples are used during training.

Contrast unlabeled example with labeled example.

unlabeled example

Question 47

Q

Google ML Vocabulary

Weight

Answer

A

A value that a model multiplies by another value. Training is the process of determining a model’s ideal weights. Inference is the process of using those learned weights to make predictions.

weight

Question 48

Q

Google ML Vocabulary

Unsupervised Machine Learning

Answer

A

Training a model to find patterns in a dataset, typically an unlabeled dataset.

The most common use of unsupervised machine learning is to cluster data into groups of similar examples. For example, an unsupervised machine learning algorithm can cluster songs based on various properties of the music. The resulting clusters can become an input to other machine learning algorithms (for example, to a music recommendation service). Clustering can help when useful labels are scarce or absent. For example, in domains such as anti-abuse and fraud, clusters can help humans better understand the data.

Contrast with supervised machine learning.

unsupervised machine learning

Google ML Vocabulary Flashcards

Vocabulary Terms from the Google Machine Learning Glossary