Week 1 - Welcome to Machine learning Flashcards

1
Q

What is machine learning?

A

The science of getting computers to learn, without being explicity programmed.

We often don’t know how to program things from the start so instead we have programs learn through experience, much the same way we do.

Example: spam filiters, web searching, netflix reccomendations, amazon reccomendations

Tom Mitchell provides a more modern definition: “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.”

Example: playing checkers.

E = the experience of playing many games of checkers

T = the task of playing checkers.

P = the probability that the program will win the next game.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is supervised learning?

A

The “right answers” are given

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a regression problem?

A

When you are trying to predict a continuous valued output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a classification problem?

A

When you are trying to predict a discrete valued output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is unsupervised learning?

A

Ask an algorithm to find structure in data. There is no feedback based on prediction results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a cost function?

What is a loss function?

A

The technique of evaluating the performance of our algorithm/model.

It takes both predicted outputs by the model and actual outputs and calculates how much wrong the model was in its prediction. It outputs a higher number if our predictions differ a lot from the actual values. As we tune our model to improve the predictions, the cost function acts as an indicator of how the model has improved. This is essentially an optimization problem. The optimization strategies always aim at “minimizing the cost function”.

Loss function: Used when we refer to the error for a single training example.
Cost function: Used to refer to an average of the loss functions over an entire training dataset.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the cost function for linear regression?

A

The formula for calculating mean squared error, essentially the way you assess model fit

MSE = (sum of squared errors)/n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is gradient descent?

A

an iterative first-order optimisation algorithm used to find a local minimum/maximum of a given function, it repeats until convergence

You simultaneously update parameters in this algorithm

theta = theta - alpha * derivative of the cost function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the difference between

a := b

a = b

?

A

a := b is using the assignment operator; this means to replace a with the value of b

a = b is a truth assertion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a derivative in calculus?

How does a derivative play into the gradient decent algorithm?

A

The definition of the derivative is the slope of a line that lies tangent to the curve at the specific point

Gradient descent incorporates the derivative (slope of the line) of the cost function when trying to minimize theta (coefficients). When the derivative is zero, theta is equal to theta and thus gradient descent has found a local minimum which is what you want

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the concern about setting the proper alpha value when optimizing gradient descent?

A

If alpha is too small, it will take a long time for the algorithm to converge

If alpha is too large, the algorithm may struggle to converge or even diverge as it keeps leaping past the local minimum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does the term “batch” refer to when using gradient descent?

A

Each step of gradient descent uses the entire training set, the whole “batch”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Do you have to constantly make alpha smaller in order to converge with gradient descent?

A

No; the derivatives will become smaller and thus alpha will be multiplied by a smaller number as the algorithm coverges on a local minima. Thus the value will get smaller as you approach convergence, even if alpha is fixed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly