2 - We Are All Just Numbers Here... Flashcards
Who was William Rowan Hamilton?
An Irish mathematician known for his work on quaternions.
What significant event happened on October 16, 1843?
Hamilton had a flash of inspiration for the quaternion formula while walking along the Royal Canal.
What is the fundamental formula for quaternion multiplication?
i² = j² = k² = ijk = -1.
What did Hamilton etch on the stone of Brougham Bridge?
The fundamental formula for quaternion multiplication.
Define a scalar quantity.
A stand-alone number that represents magnitude only.
Define a vector.
A quantity that has both magnitude and direction.
What are the components of a vector?
The x-component and y-component.
How can the magnitude of a vector be calculated?
Using the Pythagorean theorem: √(x² + y²).
What does Newton’s Second Law of Motion state?
Acceleration is proportional to the force acting on an object and they have the same direction.
What geometrical shape is used to represent vector addition?
A parallelogram.
What is the resultant vector in the example of a man walking from (0,0) to (6,9)?
The vector from (0,0) to (6,9).
What is the net distance in the xy coordinate space from the origin to (6,9)?
10.82 miles.
What happens when you subtract vectors?
It indicates if one force is acting against another.
What is the effect of multiplying a vector by a scalar?
It scales the vector’s magnitude.
Define a unit vector.
A vector with a magnitude of 1.
What is the dot product of two vectors?
The magnitude of one vector multiplied by the projection of another onto it.
What does a dot product of zero indicate?
The two vectors are orthogonal (at right angles).
How is the dot product calculated using vector components?
a.b = a1b1 + a2b2.
What is the significance of Hamilton’s work on quaternions for machine learning?
It laid foundational mathematical concepts important for vector analysis.
Fill in the blank: A ______ is a mathematical entity composed of four elements.
quaternion.
True or False: The magnitude of a vector can be negative.
False.
What does the projection of one vector onto another represent?
The ‘shadow cast’ by one vector onto another.
What is the equation for the scalar quantity when dealing with vectors a and b?
a.b = a 1 b 1 + a 2 b 2
What do the vectors i and j represent in the context of dot products?
Orthogonal vectors, where i.j and j.i are zero, and both i.i and j.j equal 1
What does a perceptron output if the weighted sum of its inputs plus the bias term is greater than 0?
1
What is the output of a perceptron if the weighted sum is less than or equal to 0?
-1
In the perceptron model, how can the weights be represented?
As a vector w = (w1, w2)
What geometrical concept does the perceptron use to separate data points into clusters?
A linearly separating hyperplane
What is the relationship between the weight vector w and the separating hyperplane?
The vector w is orthogonal to the hyperplane
What does the dot product of a data point vector and the weight vector indicate?
The distance of the data point from the hyperplane
What happens when a data point lies on the hyperplane?
The dot product with the weight vector equals zero
What is the significance of the bias term in a perceptron?
It moves the hyperplane away from the origin without changing its orientation
Fill in the blank: The perceptron learning algorithm guarantees to find one separating hyperplane, but not necessarily the _____ one.
best
What is the mathematical representation of a one-column matrix with two elements?
A column matrix indexed by numbers 1 and 2
What is the process of flipping a column matrix on its side called?
Taking the transpose of a matrix
What is the notation for the transpose of matrix A?
A T
In the context of matrices, what is a vector?
A particular form of matrix with either one row or one column
What is the relationship between the number of columns in the first matrix and the number of rows in the second for taking a dot product?
They must be equal
How can the weighted sum of inputs in a perceptron be concisely written?
As the dot product w T x
What does the perceptron learn from a set of input data vectors?
The weight vector that represents a hyperplane separating the data into two clusters
What is the significance of the hyperplane in the context of classification?
It determines the classification of new data points based on their position relative to it
True or False: The perceptron can classify data points as ‘obese’ or ‘not-obese’ based on their position relative to the hyperplane.
True
What is the role of modern deep neural networks in relation to the perceptron?
They build upon the foundational concepts established by the perceptron
What is a perceptron learning algorithm?
A computationally viable algorithm for binary classification that involves finding a hyperplane to separate data into two groups.
What defines a ‘solution’ in the context of perceptrons?
A hyperplane that linearly separates the data into two groups.
Who developed a significant proof regarding the perceptron learning algorithm in 1962?
Henry David Block.
What did Block’s proof establish?
Upper bounds for the number of mistakes made by the perceptron learning algorithm.
What is the focus of Minsky and Papert’s book ‘Perceptrons’?
A class of computations that make decisions by weighing evidence.
What was a notable criticism made by Block in his review of ‘Perceptrons’?
He objected to Minsky and Papert’s implication that cyberneticists should have known about earlier convergence proofs.
What is the significance of the term ‘cybernetics’?
The study of control and communication in the animal and the machine.
What are the six variables used to categorize patients in the discussed pandemic scenario?
- x1 = age
- x2 = body mass index
- x3 = has difficulty breathing (yes = 1/no = 0)
- x4 = has fever (yes/no)
- x5 = has diabetes (yes/no)
- x6 = chest CT scan (0 = clear, 1 = mild infection, 2 = severe infection)
What does the outcome ‘y’ represent for each patient?
y = -1 (did not need ventilator support) or y = 1 (needed ventilator support).
What is the goal of training a perceptron in this context?
To find a separating hyperplane for the data points.
What is the first step in the perceptron training algorithm?
Initialize the weight vector to zero: set w = 0.
What condition necessitates updating the weight vector in the perceptron algorithm?
If y w^T x ≤ 0.
How does the perceptron determine if the weights are correct?
If the expression y w^T x is positive.
What does the convergence proof by Minsky and Papert establish?
The perceptron will converge to a solution in a finite number of steps if one exists.
What is the significance of the dot product of weight vectors during training?
It indicates how closely the weight vector aligns with the desired weight vector.
What does the term ‘XOR problem’ refer to in perceptrons?
A problem that cannot be solved by a single layer of perceptrons, as it cannot be linearly separated.
What is the relationship between lower and upper bounds in computational complexity?
Lower bounds indicate what is impossible, while upper bounds measure resource limits for solutions.
What does the weight vector w represent in the perceptron model?
The parameters that define the hyperplane separating the data.
Fill in the blank: The perceptron learning algorithm updates the weight vector by adding _______.
y x.
True or False: The perceptron algorithm guarantees a solution for all types of data.
False.
What major assumption is made about the data in the context of perceptrons?
The data are linearly separable.
What is the bias term in the perceptron model denoted as?
w0.
What is a perceptron?
A simple type of artificial neuron used in machine learning
What problem did Minsky and Papert prove that a single layer of perceptrons could not solve?
The XOR problem
What are the four data points involved in the XOR problem?
- (0, 0) * (1, 0) * (1, 1) * (0, 1)
What must a perceptron output for the points (0, 0) and (1, 1)?
y = 1
What must a perceptron output for the points (1, 0) and (0, 1)?
y = -1
What does the term ‘multi-layer perceptrons’ refer to?
Perceptrons stacked such that the output of one feeds into the input of another
What algorithm was published by Rumelhart, Hinton, and Williams in 1986?
Backpropagation
What does the backpropagation algorithm rely on?
Calculus and optimization theory
What is the significance of the year 1982 in neural network research?
A physicist’s unique solution to a biological problem re-energized the field
What is the goal of the perceptron algorithm?
To find a linearly separating hyperplane
What is the initial step in the perceptron algorithm?
Initialize the weight vector to zero: set w = 0
When does the weight vector get updated in the perceptron algorithm?
If y w^T x ≤ 0
What is the equation for updating the weight vector in the perceptron?
w_new = w_old + y x
What does γ (gamma) represent in the perceptron algorithm?
The distance between the linear separating hyperplane and the closest data point
What is the dot product of a vector with itself always greater than or equal to?
0
What happens to the dot product w^T w* after each update?
It grows by at least γ
What happens to the dot product w^T w after each update?
It grows by at most 1
How can the number of updates M required for the perceptron to converge be described?
M is always a finite quantity
What is the maximum number of updates required for convergence in the perceptron algorithm?
1 over γ²
True or False: The perceptron algorithm guarantees convergence in a finite number of steps.
True
In the context of the perceptron, what does the term ‘linearly separating hyperplane’ refer to?
A hyperplane that separates different classes of data points
What is a key limitation of a single layer perceptron?
It cannot solve problems like XOR that are not linearly separable
What is the significance of normalization in the perceptron algorithm?
It ensures all input data points have magnitudes less than or equal to 1
Fill in the blank: The perceptron will converge without fail in a finite number of _______.
steps