ML exam 1 Flashcards

1
Q

What is supervised learning?

A

to learn a model from labeled training data that allows us to make predictions about unseen or future data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Rosenblatt perception

A
  • binary classification task
  • positive class (1) vs negative class (-1)
    -takes input as a dot product of input and weights
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

step function

A

1 if z >= theta
-1 if otherwise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what does z equal

A

the linear combination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

rosenblatt perception algorithm

A
  1. initialize the weight to 0 or small number
  2. for each training sample x(i),
    a. comput y hat or output value
    b. update weights
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

weight update rule

A

w(j) = w(j) + deltaw(j)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

perception learning rule

A

deltaw(j) = n(y(i) - y hat(i))xj(i)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

linear separability

A

draw a line through the negative and positive class

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

convergence

A

convergence if guaranteed if the two classes are linearly separable and learning rate is sufficiently small

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

if classes cannot be separated,

A

Set a maximum number of passes over the training dataset
(epochs)
Set a threshold for the number of tolerated misclassification
Otherwise, it will never stop updating weights (converge)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

diagram of Rosenblatt perception

A

see pic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Adaline

A

Weights updated based on a linear activation function

Remember that perceptron used a unit step function

φ(z) is simply the identity function of the net input
φ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Adaline diagram

A

see pic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

adaline vs rosenblatt

A

The weight update is done based on all samples in training set
Perceptron updates weights incrementally after each sample
This approach is known as “batch” gradient descent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

cost function and equation

A

ML algorithms often define an objective function
This function is optimized during learning
It is often a cost function we want to minimize
Adaline uses a cost function J(·)
Learns weights as the sum of squared errors (SSE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

advantages of adaline cost function

A

The linear activation function is differentiable
Unlike the unit step function
Why derivatives?
We need to know how much each variable affects the output!
It is convex
Can use gradient descent to learn the weight

17
Q

gradient descent

A

More precisely, the
gradient points in the direction of the greatest rate of increase
of the function, and its magnitude is the slope of the graph in
that direction.
- finds the local minimum of a given function

18
Q

gradient computation

A

To compute the gradient of the cost function, we need to compute
the partial derivative of the cost function with respect to each
weight wj

19
Q

We update all weights simultaneously, so Adaline learning rule
becomes

A

w := w + ∆w.

20
Q

adaline vs rosenblatt

A

Looks (almost) identical. What is the difference?
theta(z(i)) with z(i) being the wTx is a real number
And not an integer class label as in Perceptron
The weight update is done based on all samples in training set
Perceptron updates weights incrementally after each sample
This approach is known as “batch” gradient descent

21
Q

if the learning rate is too high

A

error becomes larger (overshoots global min)

22
Q

if the learning rate is too low

A

takes too many epochs to cover

23
Q

stochastic gradient descent

A

an optimization algorithm often used in machine learning applications to find the model parameters that correspond to the best fit between predicted and actual outputs