Final Exam Sample Questions Flashcards
What does the Markov assumption mean?
We can interpret/estimate the probability of a random variable from time 0 to time t
using a smaller window of size t-x
to time t
(where x is a finite fixed number such as 1 or 2)
Why do you need a non-linearity function?
Without passing the output of each layer through a non-linearity function, a multilayer perceptron will still result in a linear output
What is the difference between the Sigmoid function and the Softmax function?
The size of the input and what the function does with that input is different:
- Sigmoid receives 1 number as input, and makes sure the output is between 0 and 1
- Softmax receives a vector of numbers, and outputs a vector of nonnegative numbers that SUM to 1.
What are the problems with conventional approach to convert image into list of pixels which you feed into a neural network?
Requires trillions of weights in the first layer, which requires lots of hardware and time to train
In the absence of an input sequence (i.e., it is a single input), is there a difference between a RNN and a feedforward network?
No. RNN has a memory ability, so if the sequence is not there, you don’t have any feedbacks. Then it’s the same as a feedforward.
When the model encounters new words which aren’t in the Language, the probability of a phrase containing that word is 0.
How can we deal with this problem?
Apply smoothing (e.g., laplace smoothing)