Machine Learning Flashcards

1
Q

List ML algorithm categories.

A
  • Supervised learning
  • Unsupervised learning
  • Reinforcement learning
  • Recommender systems
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Examples of supervised learning

A
  • Regression: Predicting continuous value output
  • Classification (Logistic regression): Predicting discrete value output
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Examples of unsupervised learning

A
  • Clustering: Google news, computer cluster analysis, market segmentation, cocktail party problem, SN analysis
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Hypothesis (model) and cost function for linear regression with a single variable

A

hθ(x) = θ0 + θ1*x

J(θ) = 1/(2m) * Σ{i=1~m} (hθ(x(i)) - y(i))2

  • m: number of the training data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to find the parameter set for a linear regression problem?

A

Find a parameter set that minimizes the cost function, i.e.,
minθJ(θ)

One way of solving this optimization problem is the gradient descent algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe the gradient descent algorithm.

A

repeat until convergence {
for all j’s (simultaneously)
for i=1 to m {
θj := θj - a* (d/dθj J(θ))
}
}
}

a: learning parameter - Note that all ‘thetaj’ are updated simultaneously.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Discuss the learning rate of the gradient descent algorithm.

A

a: too small –> converges too slow a: too big –> might fail to converge or even diverge

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Gradient descent algorithm for a linear regression with a single variable.

A

repeat until convergence {
for i=1 to m {
θ0 := θ0 - a* (h(x(i)-y(i))
θ1 := θ1 - a* (h(x(i)-y(i))*x(i)
}
}

* Note: This is a batch gradient descent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is “batch” gradient descent?

A

Each step of the gradient descent uses all the training samples.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Hypothesis and cost function of a linear regression with multi-variables.

A

hθ(X) = θT•X

  • θT = [θ0, … , θn]
  • XT = [1, x1, …, xn]

J(θ) = 1/(2m) * Σ{i=1~m} (hθ(X(i)) - y(i))2

  • m: number of the training data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Gradient descent of a linear regression with multi-variables.

A

repeat until convergence {
for all j in {0, 1, …, n}
for i=1 to m
θj := θ - a* (hθ(X(i))-y(i))
}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Feature scaling and GD

A

For GD to work well, features must have a similar scale. Mean normalization can be used. X := (X - mu)/S - mu: mean vector - S: std or (max-min)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do you make sure GD is working?

A

Plot the J(θ) as the number of iteration and see if it decreases at each iteration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How to extend a linear regression to Polynomial regression for non-linear function?

A

Create new features from the existing ones.
For example,
x1 = x1
x2 = x12
x3 = x13
Then, solve the new feature sets using the linear regression technique.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Normal equation for linear regression.

A

θ=(XT•X)-1•XT•y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Explain Logistic Regression

A

In solving a {0, 1} classification problem, we want the hypothesis (model) function value to be in [0 1] range.

  • For linear regression, hθ(X) =θT•X
  • For logistic regression, hθ(X) = g(θT•X) = 1/(1+exp(-θT•X))
    • g(t) = 1/(1+exp(-t)): Sigmoid (logistic) function
  • Interpretation: hθ(X) = p(y=1 | x ; θ) –> Probability y = 1, given X, parameterized by θ
17
Q

Decision boundary for logistic regression

A

Suppose

  • Predict “y=1” if hθ(X) >= 0.5
  • Predict “y=0” if hθ(X) < 0.5

Then the decision boundary is θT•X=0.

18
Q

Cost function for Logistic Regression

A

J(θ)=1/m*Σ{i=1~m} cost(hθ(X(i)), y(i))

where cost(hθ(X(i)), y(i)) is

  • -log(hθ(X(i))) if y = 1
  • -log(1-hθ(X(i))) if y = 0

If you combine the above two terms, then

cost(•) = -y*log(hθ(X(i))) -(1-y)*log(1-hθ(X(i)))

19
Q

After training, your found your ML algorithm produce high prediction error with test data. What can you do?

A
  • Get more examples –> helps to fix high variance
    • Not good if you have high bias (underfitting)
  • Smaller set of features –> fixes high variance (overfitting)
    • Not good if you have high bias
  • Try adding additional features –> fixes high bias (because hypothesis is too simple, make hypothesis more specific)
  • Add polynomial terms –> fixes high bias problem
  • Decreasing λ –> fixes high bias
  • Increases λ –> fixes high variance
20
Q
A