IDL Flashcards
How do neurons learn?
Learning by changing the topology & thickness of connections
In “Vanilla” Recurrent Neural Network, what is the activation fn in output layer?
output layer activated by softmax function (can represent probability distribution over words)
In vanilla recurrent network, what is s(t) and y(t)?
s(t) is f(U w(t) + Ws(t-1)) where f = sigmoid activation function
y(t) = g(Vs(t)) where g is softmax activation function
Define the pocket convergence theorem
The pocket algorithm converges with a probability 1 to optimal weights even if the sets are not linearly separable
Define Cover’s theorem
What is the probability that a randomly labeled set of N points in d dimensions is linearly separable?
Apply cover’s theorem in higher dimensional space
Id the number of points in d dimensions is less than 2*d, they are almost always linearly separable
What is Adaline? What is the activation function in Adaline?
Adaptive line element,
The difference between Adaline and the standard (McCulloch–Pitts) perceptron is that in the learning phase, the weights are adjusted according to the weighted sum of the inputs (the net). In the standard perceptron, the net is passed to the activation (transfer) function and the function’s output is used for adjusting the weights.
Identity function is the activation function
Why is LSTM better than Vanilla rec net?
Ability to learn which remote and recent information is relevant for given task and using it to generate output