3 - The Bottom of the Bowl Flashcards

Question 1

Q

Who was Bernard Widrow?

Answer

A

A young academic at Stanford University in the autumn of 1959.

Question 2

Q

What was the focus of Widrow’s work?

Answer

A

Adaptive filters and the use of calculus to optimize them.

Question 3

Q

Who is Marcian ‘Ted’ Hoff?

Answer

A

A graduate student who approached Widrow for discussion.

Question 4

Q

What significant algorithm did Widrow and Hoff invent?

Answer

A

The least mean squares (LMS) algorithm.

Question 5

Q

What is the LMS algorithm foundational for?

Answer

A

Training artificial neural networks.

Question 6

Q

Where did Widrow grow up?

Answer

A

A small town in Connecticut.

Question 7

Q

What did Widrow’s father do for a living?

Answer

A

Ran an ice-manufacturing plant.

Question 8

Q

What did Widrow initially want to be when he grew up?

Answer

A

An electrician.

Question 9

Q

What subtle course correction did Widrow’s father suggest?

Answer

A

To become an electrical engineer instead of an electrician.

Question 10

Q

Where did Widrow obtain his degrees?

Question 11

Q

What workshop did Widrow attend in the summer of 1956?

Answer

A

A workshop on artificial intelligence at Dartmouth College.

Question 12

Q

Who is credited with coining the term ‘artificial intelligence’?

Answer

A

John McCarthy.

Question 13

Q

What was the main goal of the Dartmouth Summer Research Project?

Answer

A

To explore how machines can simulate aspects of learning and intelligence.

Question 14

Q

What did Widrow conclude after six months of thinking about thinking?

Answer

A

It would take twenty-five years to build a thinking machine with the technology of that time.

Question 15

Q

What did Widrow turn his attention to after abandoning plans for a thinking machine?

Answer

A

Adaptive filters that could learn to remove noise from signals.

Question 16

Q

Who developed the theory that Widrow was particularly interested in?

Answer

A

Norbert Wiener.

Question 17

Q

What is the goal of an adaptive filter?

Answer

A

To learn from its mistakes and improve over time.

Question 18

Q

What does the mean squared error (MSE) measure?

Answer

A

The average of the squares of the errors made by the filter.

Question 19

Q

What mathematical method is used to minimize the mean squared error?

Answer

A

The method of steepest descent.

Question 20

Q

What does the term ‘gradient’ refer to in calculus?

Answer

A

The slope of a function at a given point.

Question 21

Q

What is the derivative of the function y = x^2?

Question 22

Q

What is the purpose of differential calculus?

Answer

A

To calculate the slope of a continuous function.

Question 23

Q

At what point is the slope of a function typically zero?

Answer

A

At the minimum of the function.

Question 24

Q

What is the method of steepest descent also known as?

Answer

A

The method of gradient descent.

Question 25

Q

What must be calculated to take a step toward the minimum of a curve?

Answer

A

The slope or gradient at the current location.

Question 26

Q

What is the significance of the step size in the gradient descent method?

Answer

A

It must be small to avoid overshooting the minimum.

Question 27

Q

True or False: The steps in gradient descent become larger as you approach the minimum.

Question 28

Q

What happens to the step size during gradient descent as you approach the minimum?

Answer

A

The jumps along the curve become smaller as you near the bottom.

This is because the gradient is getting smaller.

Question 29

Q

What type of functions have a single, well-defined minimum?

Answer

A

Convex functions.

The global minimum is the bottom of the bowl-shaped graph.

Question 30

Q

What is a saddle point in the context of optimization?

Answer

A

An unstable point where the gradient is zero but is not a minimum.

The function does not have a global or local minimum at a saddle point.

Question 31

Q

What is the gradient in a multi-variable function?

Answer

A

A vector composed of partial derivatives with respect to each variable.

The gradient points away from the minimum.

Question 32

Q

How do you calculate the gradient for a function with multiple variables?

Answer

A

By taking partial derivatives of the function with respect to each variable.

The notation used includes ∂ for partial derivatives.

Question 33

Q

What is the significance of the gradient vector in optimization?

Answer

A

It indicates the direction of steepest descent.

To move towards the minimum, one must follow the negative of the gradient.

Question 34

Q

What is an adaptive filter in signal processing?

Answer

A

A filter that adjusts its parameters to minimize the error between the desired and actual output signals.

It is essential in applications like digital communications.

Question 35

Q

What equation describes the error in an adaptive filter?

Answer

A

en = dn - yn.

Here, dn is the desired signal and yn is the output signal.

Question 36

Q

What is the function of the adaptive filter during a modem handshake?

Answer

A

It learns the characteristics of noise to create an error-free communication channel.

This is crucial for digital devices transmitting over noisy analog lines.

Question 37

Q

What does it mean for a function to be differentiable?

Answer

A

It means that the function can be differentiated to find its derivatives.

Differentiability allows for the calculation of gradients.

Question 38

Q

What does the notation ‘z = x^2 + y^2’ represent in optimization?

Answer

A

An elliptic paraboloid surface in three-dimensional space.

This represents a function with two variables.

Question 39

Q

What is the role of partial derivatives in finding the gradient?

Answer

A

They provide the components of the gradient vector.

Each component corresponds to a variable’s contribution to the slope.

Question 40

Q

How can you express the output of an adaptive filter mathematically?

Answer

A

yn = w.xn, where w is the weight vector and xn is the input vector.

This represents the linear combination of inputs adjusted by weights.

Question 41

Q

What field of study does multi-variate calculus belong to?

Answer

A

Calculus involving functions with multiple variables.

It is essential for understanding gradient descent in machine learning.

Question 42

Q

What is the primary purpose of an adaptive filter?

Answer

A

To adapt to varying noise conditions and minimize output error.

This is crucial for maintaining signal integrity in communication systems.

Question 43

Q

What happens when you start from a different location while descending a gradient?

Answer

A

You may veer away from the saddle point.

The starting point can dictate the convergence path in optimization.

Question 44

Q

What is the relationship between functions and vectors in optimization?

Answer

A

The gradient is a vector derived from the function’s partial derivatives.

This illustrates the interplay between different mathematical domains.

Question 45

Q

What is the formula for the output of an adaptive filter?

Answer

A

yn = w.xn

Where xn = [ xn, xn1 , …] and w = [ w0, w1, …]

Question 46

Q

What is the expression for the error made by the filter at the n-th time step?

Answer

A

en = dn - yn

This can be rewritten as en = dn - w.xn

Question 47

Q

What is the goal of an adaptive filter?

Answer

A

To minimize the error between the generated output and the desired signal

Question 48

Q

How do we calculate the average error in adaptive filtering?

Answer

A

Using mean absolute error (MAE) or mean squared error (MSE)

MSE is preferred due to its statistical properties and differentiability

Question 49

Q

What is the mathematical representation of the value to be minimized in adaptive filtering?

Answer

A

J = E (( dn - yn )^2)

This represents the expected value of the squared errors.

Question 50

Q

What type of function is formed when relating J to the filter parameter w?

Answer

A

A quadratic function

Question 51

Q

What method can be used to minimize J if the correlation between inputs and outputs is unknown?

Answer

A

Method of steepest descent

Question 52

Q

What does stochastic gradient descent (SGD) refer to?

Answer

A

A method where the direction of each step in descent is slightly random

Question 53

Q

What is the output of an adaptive neuron designed by Widrow and Hoff?

Answer

A

y = w0 x0 + w1 x1 + w2 x2

Question 54

Q

What does the term ‘bias’ refer to in the context of adaptive neurons?

Answer

A

w0, which is the coefficient associated with input x0 set to 1

Question 55

Q

What is the update rule for the weights in the LMS algorithm?

Answer

A

w new = w old + 2 με x

Where μ = step size, ε = error, and x = vector of a single data point.

Question 56

Q

What is the error in the context of an adaptive neuron?

Answer

A

ε = d - w^T x

Where d is the desired output.

Question 57

Q

What is the significance of the LMS algorithm?

Answer

A

It is widely used in adaptive filters and is the first algorithm for training artificial neurons using gradient descent principles

Question 58

Q

What was the original context in which Widrow and Hoff discovered their algorithm?

Answer

A

They were working on adaptive filters and neural elements at Stanford

Question 59

Q

What was the result of running the algorithm on the analog computer?

Answer

A

It verified that the algorithm worked

Question 60

Q

What was the first task after confirming the algorithm worked?

Answer

A

Building a single adaptive neuron

Question 61

Q

What is the problem with calculating the optimal values for filter parameters?

Answer

A

It requires more samples of input and desired output, making calculations time-consuming

Question 62

Q

What does the gradient represent in the context of optimization?

Answer

A

A vector of partial derivatives of the mean squared error with respect to each weight

Question 63

Q

What is the nature of the function representing the expectation value of squared errors?

Answer

A

Bowl-shaped function in higher-dimensional space

Question 64

Q

What is the method used to estimate the gradient without full calculations?

Answer

A

Using an estimate based on just one data point

Answer 62

A

Widrow and Hoff

Answer 63

A

ADALINE stands for ‘adaptive linear neuron’ and is an adaptive neuron that learns to be a good neuron.

Answer 64

A

ADALINE uses the LMS algorithm.

Answer 65

A

It separates an input space into two regions, helping to find the weights that represent the linearly separating hyperplane.

Answer 66

A

The input space is a 16-dimensional space defined by 4×4 pixels.

Answer 67

A

Each letter is represented by 16 binary digits, which can be either 0 or 1.

Answer 68

A

ADALINE uses the LMS algorithm while the perceptron uses a different algorithm to find the linearly separating hyperplane.

Answer 69

A

He discovered that the LMS algorithm is an unbiased estimate and that taking extremely small steps leads to the optimal value for the weights.

Answer 70

A

ADALINE (single layer of adaptive neurons)
MADALINE (multiple layers: input, hidden, output)

Answer 71

A

It was hard to train MADALINE.

Answer 72

A

The LMS algorithm is the foundation of backpropagation, which is essential for modern AI.

Answer 73

A

Hoff was one of the key people behind the development of the Intel 4004, the company’s first general-purpose microprocessor.

Answer 74

A

‘Computers that Learn.’

Answer 75

A

It was described as a machine that can learn to balance a broom, which was presented as a remarkable feat.

Answer 76

A

Frank Rosenblatt
Bernard Widrow

Answer 77

A

linearly separating hyperplane

Answer 78

A

He stated that it spells ‘Adaptive Linear Neuron.’