L2_NCC_Perceptron Flashcards
Cognitive functions have to be investigated on 3 levels?
• Computational Level
What does a cognitive function do?
• Algorithmic Level
What is the functional organization within a cognitive module?
• Implementational Level
What is the physical/physiological realization of this algorithm?
Explain the 3 Steps of Nearest Centroid Classifier (NCC)
- Calculate the class means of the Prototypes w1 and w2
- find the linear classification boundary: w.Tx-ß = 0
with w = w1 - w2 and ß =1/2 * (w1.Tw1 -w2.T*w2) - Classify the class label sign f(x)= +1 / -1
Basic Artificial Neural Networks model (4 point)
- Input nodes xi receive information
- Inputs are multiplied with a weighting factor wi and summed up
- Integrated input is mapped through some (non-linear) function f (·)
- f(x)= +1 if x is preferred stimulus / -1 if x is any other stimulus
Goal of the The Perceptron Learning Algorithm
Binary classification of multivariate data x ∈ RD
Input of the The Perceptron Learning Algorithm
Learning rate η and N tupels (xn,yn) where
- xn ∈ RD is the D-dimensional data
- yn ∈ {−1, +1} is the corresponding label
Output of the The Perceptron Learning Algorithm
Weight vector w ∈ RD such that w.T*xN =
≥0 if yn = +1 / < 0 if yn = −1
What is a good w?
We need an error function
that tells us how good w is.
Then we chose w such that the error function is minimized.
Perceptron error EP is a function of the weights w, where M denotes the index set of all misclassified data xm
Explain the Perceptron Learning Algorithm (2 steps)
- Initialize wOLD (randomly, 1/n, …)
- While there are misclassified data points
Pick a random misclassified data point xm
Descent in direction of the gradient at single data point xM:
Em(w) = −w.⊤ * xM * yM
∇Em (w) = −xm ym
wNEW ← wOLD-η∇Em(wOLD) = wOLD + η * xM * yM
[Novikoff, 1962; Rosenblatt, 1962] about The Perceptron Learning Algorithm
If there is a solution, the perceptron algorithm will find it in a finite number of steps
Convergence on non-linearably-separable sets:
wnew ← wold + ηxmym
t→∞
• Proven for variable learning rate η(t), with η(t) → 0
• Best convergence speed is achieved for η(t) ∼ 1
Problems with Nearest Centroid Classification
Non-linear Data
Correlated Data
High accuracy but ”lousy” weight vector - why?
Misconception: signal channels (here: pixels) with large classifier weights are strongly related to the class label
The purpose of the weight vector is two-fold: amplify the signal of interest, while at the same time surpress signals of no interest.