NLP-embedding Flashcards
What are the advantages of embedding words as indices in a vocabulary?
This choice has several good
reasons - simplicity, robustness and the observation that simple models trained on huge amounts of
data outperform complex systems trained on less data. An example is the popular N-gram model
used for statistical language modeling - today, it is possible to train N-grams on virtually all available
data (trillions of words [3])
What are the limists of embedding words as indices in a vocabulary?
the amount of
relevant in-domain data for automatic speech recognition is limited - the performance is usually
dominated by the size of high quality transcribed speech data (often just millions of words). In
machine translation, the existing corpora for many languages contain only a few billions of words
or less. Thus, there are situations where simple scaling up of the basic techniques will not result in
any significant progress, and we have to focus on more advanced techniques.
What is the difference between the ccontinous bag of words and the skip-gram?
skipgram is similar to CBOW, but instead of predicting the current word based on the
context, it tries to maximize classification of a word based on another word in the same sentence.
More precisely, we use each current word as an input to a log-linear classifier with continuous
projection layer, and predict words within a certain range before and after the current word. We
found that increasing the range improves quality of the resulting word vectors, but it also increases
the computational complexity. S
What is the difference between the ccontinous bag of words and the skip-gram?
skipgram is similar to CBOW, but instead of predicting the current word based on the
context, it tries to maximize classification of a word based on another word in the same sentence.
More precisely, we use each current word as an input to a log-linear classifier with continuous
projection layer, and predict words within a certain range before and after the current word. We
found that increasing the range improves quality of the resulting word vectors, but it also increases
the computational complexity. S
What is an alternative to the computational problem of softmax in NLP embedding context?
Negative sampling.
What is the computational advantage of skip-gram?
Unlike most
of the previously used neural network architectures for learning word vectors, training of the Skipgram
model does not involve dense matrix multiplications. This makes the training
extremely efficient: an optimized single-machine implementation can train on more than 100 billion
words in one day
What is the computational advantage of skip-gram?
Unlike most
of the previously used neural network architectures for learning word vectors, training of the Skipgram
model does not involve dense matrix multiplications. This makes the training
extremely efficient: an optimized single-machine implementation can train on more than 100 billion
words in one day