Class 11 Flashcards

Question 1

Q

one hot vector

Answer

A

assign a value to every word, could be used to describe every word

Question 2

Q

word embedding

Answer

A

slightly more sophisticated approach of assigning a low dimensional vector, learned automatically from the data, do good job of representing words in isolation

Question 3

Q

character level model

Answer

A

alternative to word embeddings – input is a sequence of characters, each encoded as a one hot vector – model has to learn how characters come together to form words

Question 4

Q

language model

Answer

A

probability distribution over sequences of words, need to create one that has sufficient context

Question 5

Q

machine translation

Answer

A

has the goal of translating a source language to a target language

Question 6

Q

sequence to sequence model

Answer

A

neural network architecture created by using an RNN in conjunction with an LSTM, most commonly used for machine translation (MT) but can also be used to generate a text caption for an image or summarization – 3 shortcomings: nearby context bias, fixed context size limit, slower sequential parsing

Question 7

Q

decoding

Answer

A

process of generating target words from source words

Question 8

Q

transformer architecture

Answer

A

architecture that uses a self attention mechanism that can model long distance context without a sequential dependency

Question 9

Q

multiheaded attention

Answer

A

can be used to address the problem of too much self attention – breaks sentences into pieces and applies the attention model to the pieces

Question 10

Q

positional embedding

Answer

A

technique used by a transformer to capture the ordering of words

Question 11

Q

transformer encoders

Answer

A

used to text classification tasks

Question 12

Q

transformer decoders

Answer

A

used for text classification tasks but uses a version of self attention where each word can only attend to words before it

Class 11 Flashcards

(12 cards)