Class 11 Flashcards
one hot vector
assign a value to every word, could be used to describe every word
word embedding
slightly more sophisticated approach of assigning a low dimensional vector, learned automatically from the data, do good job of representing words in isolation
character level model
alternative to word embeddings – input is a sequence of characters, each encoded as a one hot vector – model has to learn how characters come together to form words
language model
probability distribution over sequences of words, need to create one that has sufficient context
machine translation
has the goal of translating a source language to a target language
sequence to sequence model
neural network architecture created by using an RNN in conjunction with an LSTM, most commonly used for machine translation (MT) but can also be used to generate a text caption for an image or summarization – 3 shortcomings: nearby context bias, fixed context size limit, slower sequential parsing
decoding
process of generating target words from source words
transformer architecture
architecture that uses a self attention mechanism that can model long distance context without a sequential dependency
multiheaded attention
can be used to address the problem of too much self attention – breaks sentences into pieces and applies the attention model to the pieces
positional embedding
technique used by a transformer to capture the ordering of words
transformer encoders
used to text classification tasks
transformer decoders
used for text classification tasks but uses a version of self attention where each word can only attend to words before it