Machine Learning Flashcards

Question 1

Q

List ML algorithm categories.

Answer

A

Supervised learning
Unsupervised learning
Reinforcement learning
Recommender systems

Question 2

Q

Examples of supervised learning

Answer

A

Regression: Predicting continuous value output
Classification (Logistic regression): Predicting discrete value output

Question 3

Q

Examples of unsupervised learning

Answer

A

Clustering: Google news, computer cluster analysis, market segmentation, cocktail party problem, SN analysis

Question 4

Q

Hypothesis (model) and cost function for linear regression with a single variable

Answer

A

h_θ(x) = θ₀ + θ₁*x

J(θ) = 1/(2m) * Σ{i=1~m} (h_θ(x(i)) - y(i))²

m: number of the training data.

Question 5

Q

How to find the parameter set for a linear regression problem?

Answer

A

Find a parameter set that minimizes the cost function, i.e.,
min_θJ(θ)

One way of solving this optimization problem is the gradient descent algorithm.

Question 6

Q

Describe the gradient descent algorithm.

Answer

A

repeat until convergence {
for all j’s (simultaneously)
for i=1 to m {
θ_j := θ_j - a* (d/dθ_j J(θ))
}
}
}

a: learning parameter - Note that all ‘theta_j’ are updated simultaneously.

Question 7

Q

Discuss the learning rate of the gradient descent algorithm.

Answer

A

a: too small –> converges too slow a: too big –> might fail to converge or even diverge

Question 8

Q

Gradient descent algorithm for a linear regression with a single variable.

Answer

A

repeat until convergence {
for i=1 to m {
θ₀ := θ₀ - a* (h(x(i)-y(i))
θ₁ := θ₁ - a* (h(x(i)-y(i))*x(i)
}
}

* Note: This is a batch gradient descent.

Question 9

Q

What is “batch” gradient descent?

Answer

A

Each step of the gradient descent uses all the training samples.

Question 10

Q

Hypothesis and cost function of a linear regression with multi-variables.

Answer

A

h_θ(X) = θ^T•X

θ^T = [θ₀, … , θ_n]
X^T = [1, x₁, …, x_n]

J(θ) = 1/(2m) * Σ{_i=1~m} (h_θ(X(i)) - y(i))²

m: number of the training data.

Question 11

Q

Gradient descent of a linear regression with multi-variables.

Answer

A

repeat until convergence {
for all j in {0, 1, …, n}
for i=1 to m
θ_j := θ- a* (h_θ(X(i))-y(i))
}

Question 12

Q

Feature scaling and GD

Answer

A

For GD to work well, features must have a similar scale. Mean normalization can be used. X := (X - mu)/S - mu: mean vector - S: std or (max-min)

Question 13

Q

How do you make sure GD is working?

Answer

A

Plot the J(θ) as the number of iteration and see if it decreases at each iteration.

Question 14

Q

How to extend a linear regression to Polynomial regression for non-linear function?

Answer

A

Create new features from the existing ones.
For example,
x₁ = x₁
x₂ = x₁²
x₃ = x₁³
Then, solve the new feature sets using the linear regression technique.

Question 15

Q

Normal equation for linear regression.

Answer

A

θ=(X^T•X)^-1•X^T•y

Question 16

Q

Explain Logistic Regression

Answer

Study These Flashcards

A

In solving a {0, 1} classification problem, we want the hypothesis (model) function value to be in [0 1] range.

For linear regression, h_θ(X) =θ^T•X
For logistic regression, h_θ(X) = g(θ^T•X) = 1/(1+exp(-θ^T•X))
- g(t) = 1/(1+exp(-t)): Sigmoid (logistic) function
Interpretation: h_θ(X) = p(y=1 | x ; θ) –> Probability y = 1, given X, parameterized by θ

Question 17

Q

Decision boundary for logistic regression

Answer

Study These Flashcards

A

Suppose

Predict “y=1” if h_θ(X) >= 0.5
Predict “y=0” if h_θ(X) < 0.5

Then the decision boundary is θ^T•X=0.

Question 18

Q

Cost function for Logistic Regression

Answer

Study These Flashcards

A

J(θ)=1/m*Σ_{i=1~m} cost(h_θ(X(i)), y(i))

where cost(h_θ(X(i)), y(i)) is

-log(h_θ(X(i))) if y = 1
-log(1-h_θ(X(i))) if y = 0

If you combine the above two terms, then

cost(•) = -y*log(h_θ(X(i))) -(1-y)*log(1-h_θ(X(i)))

Question 19

Q

After training, your found your ML algorithm produce high prediction error with test data. What can you do?

Answer

Study These Flashcards

A

Get more examples –> helps to fix high variance
- Not good if you have high bias (underfitting)
Smaller set of features –> fixes high variance (overfitting)
- Not good if you have high bias
Try adding additional features –> fixes high bias (because hypothesis is too simple, make hypothesis more specific)
Add polynomial terms –> fixes high bias problem
Decreasing λ –> fixes high bias
Increases λ –> fixes high variance

Question 20

Q

Answer

Study These Flashcards

A

Machine Learning Flashcards

(20 cards)