01-Introduction-Terminology Flashcards
What is (supervised) machine learning?
ML systems learn how to combine input to produce useful predictions on never-before-seen data.
Label
The diagnostic category
Feature
A feature is an input variable—the x variable in simple linear regression. A simple machine learning project might use a single feature, while a more sophisticated machine learning project could use millions of features, specified as: {x_1, x_2 .. x_n} In the spam detector example, the features could include the following: words in the email text sender’s address time of day the email was sent email contains the phrase “one weird trick.”
Example
An example is a particular instance of data, x. (We put x in boldface to indicate that it is a vector.) We break examples into two categories:
- labeled examples
- unlabeled examples
Model
A model defines the relationship between features and label.
For example, a spam detection model might associate certain features strongly with “spam”.
Model Lifecycle
- Training means creating or learning the model. That is, you show the model labeled examples and enable the model to gradually learn the relationships between features and label.
- Inference means applying the trained model to unlabeled examples. That is, you use the trained model to make useful predictions (y’). For example, during inference, you can predict medianHouseValue for new unlabeled examples.
Regression vs. classification
A regression model predicts continuous values. For example, regression models make predictions that answer questions like the following:
- What is the value of a house in California?
- What is the probability that a user will click on this ad?
A classification model predicts discrete values. For example, classification models make predictions that answer questions like the following:
- Is a given email message spam or not spam?
- Is this an image of a dog, a cat, or a hamster?
Terminology
- label
- feature
- example
- training
- model
- classification model
- inference
- regression model
Regression model: equation
y’ = b+Σwi*xi
Linear regression: terminology
- bias
- inference
- linear regression
- weight
- L2 loss
L2 Loss
L2 Loss Function is used to minimize the error which is the sum of the all the squared differences between the true value and the predicted value.
L2 = Σ(y-y’)2
MeanSquaredError = L2/n
Empirical risk minimization.
The process by which a machine learning algorithm builds a model by examining many examples and attempting to find a model that minimizes loss.
Loss: terminology
- empirical risk minimization
- loss
- mean squared error
- squared loss
- training
Gradient descent
Gradient descent is a first-order iterative optimization algorithm for finding the minimum of a function. To find a local minimum of a function using gradient descent, one takes steps proportional to the negative of the gradient (or approximate gradient) of the function at the current point.
Initial value of w1
The first stage in gradient descent is to pick a starting value (a starting point) for w1. The starting point doesn’t matter much; therefore, many algorithms simply set w1 to 0 or pick a random value.