L4: Early Connectionism & Perceptrons, First AI Winter Flashcards
What is Hebbian Learning?
Published by Donald Hebb in 1949, attempted to explain associative learning, now called Hebbian learning
Cells that fire together, wire together
What was Minsky’s SNARC?
Built in 1951 with Dean Edmonds, first ever artificial neural network
Learned parameters through Hebbian mechanisms
What did Rosenblatt invent in 1958?
The perceptron (implemented as hardware in same year)
Published book about it in 1961
Uses numerical inputs and weights, uses bias term instead of threshold value
Neuron outputs 1 if w*x >= 0 (ALSO EQUAL TO 0)
What is the perceptron learning rule?
Iterative method that generates sequence of weights w0,w1,w2, . . ..
Requires a learning rate r ∈ R such that r > 0.
Algorithm sketch:
1. Initialize w0 however you want;
2. Iterate over dataset, i = 1, . . . , m.
With datapoint (xi, yi) do:
2.1 Compute output ˆyi = f (w~t, ~xi) using current weights w~t
2.2 Compute new weights w~t+1 = w~t + r · (yi − yˆi) · ~xi
3. Stop if total average error over dataset is low; otherwise go to (2)
Guess we have to define the error rate at which we stop at
Learning algorithm is only for single neurons, no efficient layer for multilayer networks
Theorem states that:
* This states that the learning algorithm converges in finitely many iterations;
* But this is only guaranteed if such a solution exists! (If no solution then unsure if will converge)
When are two sets linearly separable?
Two sets A, B ⊆ R^n are called linearly separable if there is a vector v ∈ R^n
and scalar c ∈ R such that, for all a ∈ A and b ∈ B:
v · a ≥ c and v ·b < c .
* There is a separating hyperplane such that A and B lie on different sides.
* This is encoded in a neuron with weights w~ = (−c, v1, . . . , v_n)
Finite sets A, B ⊆ R^n
are linearly separable iff their closed convex hulls are disjoint.
What was Rosenblatt’s Universal Representation Result?
Proved that MLPs are universal function representors back in 1961
Single layer is sufficient
For any function F : {0, 1}^n → {0, 1}, there is an MLP with a single hidden layer, whose input-output relation is exactly F.
Basically make a neuron that only activates on certain x ∈ {0, 1}^n- pattern matching neuron- hyperplane that cuts off that ‘corner’
Hidden layer will have exactly one pattern matching neuron for every combination- 2^n hidden neurons
Let i_x denote index of neuron “matching” x.
Let h(x) denote hidden outputs on x; then h(x) = (0, . . . , 0, 1, 0, . . . , 0)- 1 is only at index i_x! (will only ever have 1 neuron firing in hidden layer)
What is the Universal Approximation of Continuous Functions?
Modern results typically stated as approximations of functions F : R^n → R.
* Roughly: single hidden layer MLP can approximate these arbitrarily well
* Typically uses other activation function in hidden layer
* E.g. logistic sigmoid, tanh, . . .
What weights does the output layer in Rosenblatt have?
-1 for w_0 (on bias term) and 1 for everything else
So produces 1 if hidden layer matches (cause =0), otherwise 0 (cause -1)