ML-Final Flashcards
What is threshold logical unit
Simple model of a neuron
Each input value is multiplied with the corresponding weight value, and these weighted values are then summed.
If the weighted summed input is larger than a certain threshold value, then the output is set to one, and zero otherwise
What is weight parameter
representing the ‘strength’ of a connection
What is a perceptron
where the output is calculated from the weighted summed input with a activation function(gain function, transfer function, output function, activation function. )
Give examples of gain function, transfer function, output function, activation function.
??? sigmoid, tanh
Why we add bias term to perceptron
A bias allows a perceptron to shift the prediction to better fit.
Similarities of SVM and perceptron ?
Linear SVM is a special case of a perceptron
What is the difference between Deep Learning and SVM?
SVM solve the optimization problem with specific transformations of the feature space.
Deep learning will aim at learning the appropriate transformations.
What is delta term(delta rule?)
δ = (y(i) − y)y(1 − y)
Delta rule is a gradient descent learning rule for updating the weights of the inputs
Why a multilayer feedforward network is a universal function approximator
there is guaranteed to be a neural network so that for every possible input, (x) the value f(x) is output from the network
given enough hidden nodes, any functions can be approximated with arbitrary precision by these networks
What is error-back-propagation or backpropagation
calculation of the gradient proceeds backwards through the network, gradient of the final layer of weights being calculated first and the gradient of the first layer of weights being calculated last
Batch, mini-batch and online
online avoiding local minima, mini-batch large datasets. Batch high memory space.
No free lunch” theorem
no one model that works best for every problem.
The assumptions of a great model for one problem may not hold for another problem
What is cross entropy? (The negative log probability) what is it used for ?
The negative log probability of the given label times the current model(probability distribution)
H(p,q) = − sum[ p(y) log q(y) ]
q: true nature of data
p: The neural network model represents the probability p(y|x; w)
Derive learning rule
KL-divergence, what is it equivalent to? What are they related to? What are they used in neural network ?
related to cross entropy
H(p, q) = H(p) + KL(p||q)
minimizing the cross entropy is equivalent to minimizing the KL-divergence
both are closely related to the maximum (log) likelihood principle
use to generate learning rule?
What is softmax function, why and where does it used in neural network?
Softmax function is a generalization of the logistic function that “squeeze” the output in the range (0, 1)
It is used to highlight the largest values and suppress values which are significantly below the maximum value in a neural network.
final layer of a neural network
How neural networks are related to probabilistic regression?
cross entropy, KL-divergence
What is the relationship of maximize the log probability and cross entropy
That is, we want to maximize the log probability of the data given the labels. Since the cross entropy is the negative of this, maximizing the log probability of the data given the labels is equivalent of minimizing the cross entropy
What is deep learning?
Deep learning basically refer to neural networks with many layers
What is a filter in CNN?
It is a vector describing a pattern
What is convolution?
Convolution is the operation of multiplying and adding while shifting the filter