ML-01 - ML-01-Introduction and linear regression Flashcards by Rikard Donnelly

ML-01 - Introduction and linear regression

When did Arthur Samuel come up with his definition of machine learning?

The 1950s.

How well did you know this?

Not at all

Perfectly

ML-01 - Introduction and linear regression

What as Arthur Samuel’s definition of machine learning?

“[…] the field of study that gives computers the ability to learn without being explicitly learned/programmed.”

How well did you know this?

Not at all

Perfectly

ML-01 - Introduction and linear regression

How did Tom Mitchell define machine learning?

Machine learning is a field of study which enables a computer program
learn from experience 𝑬 with respect to some task 𝑻 in a well-posed problem
and some performance measure 𝑷, and improves the performance 𝑷 with experience 𝑬.

How well did you know this?

Not at all

Perfectly

ML-01 - Introduction and linear regression

What are the 3 broad types of machine learning?

Supervised learning
Unsupervised learning
Reinforcement learning

How well did you know this?

Not at all

Perfectly

ML-01 - Introduction and linear regression

What are the two big types of supervised learning?

Classification
Regression

How well did you know this?

Not at all

Perfectly

ML-01 - Introduction and linear regression

What are the two big types of unsupervised learning?

Clustering
Dimensionality reduction

How well did you know this?

Not at all

Perfectly

ML-01 - Introduction and linear regression

Describe the difference between regression and classification.

Regression predicts continuous values, while classification predicts discrete categories.

How well did you know this?

Not at all

Perfectly

ML-01 - Introduction and linear regression

What is Semi-supervised learning?

A type of ML approach where you have some labeled data, but lots of unlabeled data.

How well did you know this?

Not at all

Perfectly

ML-01 - Introduction and linear regression

What is reinforcement learning?

Learning by interacting with the environment.

How well did you know this?

Not at all

Perfectly

ML-01 - Introduction and linear regression

What are the 5 steps for a supervised learning workflow?

1) Get data
2) Clean, prepare, manipulate
3) Train the model
4) Test data
5) Improve

How well did you know this?

Not at all

Perfectly

ML-01 - Introduction and linear regression

What are the two most common optimization methods?

Iterative methods, like gradient descent.
Non-iterative methods, like the least squares method.

How well did you know this?

Not at all

Perfectly

ML-01 - Introduction and linear regression

Describe gradient descent.

Gradient descent works by following the gradient of a function to reach a minimum.

How well did you know this?

Not at all

Perfectly

ML-01 - Introduction and linear regression

What is the formula for gradient descent?

(See image)

How well did you know this?

Not at all

Perfectly

ML-01 - Introduction and linear regression

What are the 3 typical variants of gradient descent?

(Batch) gradient descent
Mini-batch gradient descent
Stochastic gradient descent

How well did you know this?

Not at all

Perfectly

ML-01 - Introduction and linear regression

Describe (batch) gradient descent.

use the entire training samples in each iteration (called epoch) of gradient descent.

How well did you know this?

Not at all

Perfectly

ML-01 - Introduction and linear regression

Describe mini-batch gradient descent.

Study These Flashcards

Instead of learning on all data per epoch, learning happens on subsets of the data. During each epoch, N / batch_size samples are selected without replacement. For each batch, the model is updated.

ML-01 - Introduction and linear regression

Describe stochastic gradient descent.

Study These Flashcards

During each epochs, set the batch size to 1 and update on each training example.

ML-01 - Introduction and linear regression

How do you make sure you set the learning rate correctly?

Study These Flashcards

Plot loss vs. the number of epochs and make sure the loss converges after some number of iterations.

ML-01 - Introduction and linear regression

What happens if the learning rate is too high?

Study These Flashcards

Loss might not decrease on every iteration and the training won’t converge.

ML-01 - Introduction and linear regression

What happens if the learning rate is too low?

Study These Flashcards

The learning takes a long time to converge.

ML-01 - Introduction and linear regression

Describe the training workflow steps.

Study These Flashcards

(See image)

ML-01 - Introduction and linear regression

What is feature scaling?

Study These Flashcards

A transformation of some data to minimize the effects of different scales.

Learning rates are sensitive to unnormalized data.

E.g. house prices are a lot higher than the number of square meters, and the corresponding coefficients might be disproportional.

ML-01 - Introduction and linear regression

Describe visually what happens in feature scaling.

Study These Flashcards

(See image)

The unnormalized data makes the learning curve jump around and it doesn’t converge nicely.

ML-01 - Introduction and linear regression

What are the two most commonly used normalization methods?

Study These Flashcards

Min-max normalization
Standardization (z-score)

# ML-01 - Introduction and linear regression What's the formula for min-max normalization?

(See image)

# ML-01 - Introduction and linear regression What's the range of data after applying min-max normalization?

Between 0 and 1.

# ML-01 - Introduction and linear regression What's the formula for standardization normalization?

It's not bounded and might have outliers.

# ML-01 - Introduction and linear regression What are some reasons to apply min-max normalization rather than standardization?

- Data doesn't follow a normal distribution - Data must be in a specific range - Need to keep the original shape of the data, just scaled down - Outliers are not a significant concern

# ML-01 - Introduction and linear regression What are some reasons to apply standardization rather than min-max normalization rather? (4)

- When the data follows a normal distribution - When the data has (extreme) outliers - When your algorithm assumes standardized data - When the range of your data is unknown or changing over time

# ML-01 - Introduction and linear regression What is polynomial regression?

Using linear regression (i.e. weights for parameters) where you feature engineer the data through polynomial transformations. (See image)

# ML-01 - Introduction and linear regression What's the image an example of? (See image)

Polynomial regression.

# ML-01 - Introduction and linear regression What is the "normal equation"?

A non-iterative optimization method for solving the parameters for linear regression directly.

# ML-01 - Introduction and linear regression What is the formula for the "normal equation"?

(See image)

# ML-01 - Introduction and linear regression What's the solution to the "normal equation"?

(See image)

# ML-01 - Introduction and linear regression What are the advantages of gradient descent over the normal equation?

- It works well when the number of features is large.

# ML-01 - Introduction and linear regression What are the advantages of the normal equation over gradient descent?

- No need to choose learning rate. - Doesn't iterate, solves the inverse (X^T*x)^-1.

# ML-01 - Introduction and linear regression What are the disadvantages of gradient descent over the normal equation?

- You need to choose the learning rate. - GD might run for many iterations.

# ML-01 - Introduction and linear regression What are the disadvantages of the normal equation over gradient descent?

- Can be slow if the number of features is high; the time complexity of calculating matrix inverses is O(n^3).

ML-01 - ML-01-Introduction and linear regression Flashcards

(38 cards)