C3 Flashcards

Question 1

Q

vector space model

Answer

A

dimensions are the words, documents are vectors: shows which words a document contains

Question 2

Q

2 problems vector space model

Answer

A

synonymy: many ways to refer to the same object (bike and bicycle)

polysemy: many words have more than one distinct meaning

Question 3

Q

word embeddings model

Answer

A

represent words in a continuous vector space

relatively low dimensional vector space
semantically and syntactically similar words are mapped to nearby points (distributional hypothesis)

Question 4

Q

feedforward network

Answer

A

multilayer network in which the units are connected with no cycles

three kinds of nodes: input, hidden, output
each layer is fully connected (in standard architecture)

Question 5

Q

classification with feedforward network

Answer

A

Binary: single output node

multi-class: output node for each category, output layer gives probability distribution

Question 6

Q

word2vec

Answer

A

train a neural classifier on a binary prediction task: is word w likely to show up near the word bicycle? => Take the learned classifier weights on the hidden layer as the word embeddings

computationally efficient predictive model for learning word embeddings from raw text

Question 7

Q

training word2vec

Answer

A

supervised learning problem on unlabeled data: self-supervision

treat target word and a neighbouring context word as positive examples
randomly sample other words in the lexicon to get negative samples
train a classifier to distinguish those two cases
learned weights are the embeddings

maximize similarity of (target word, context word) pairs drawn from positive examples
minimize similarity of (w,c_neg) pairs from negative examples
each word + context is a classification problem (adjust current vector or not)

=> weights get updated while the model processes the collection (minimize and maximize dot-products)

Question 8

Q

3 advantages of word2vec

Answer

A

it scales: can be trained on billion word corpora in limited time + possibility of parallel training
pre-trained word embeddings trained by one can be used by others
incremental training: train one piece, save results, continue later

Question 9

Q

4 text mining tasks that benefit from using word2vec

Answer

A

synonym detection
richer word and context representation for named entity recognition
document similarity
finding word associations/clusters

Question 10

Q

tf-idf

Answer

A

w_t,d = tf_t,d * idf_t

tf = term frequency of word t in document d = 1 + log_10(count(t,d))

idf = inverse document frequency = log_10(N/df_t) with N the total number of documents and df_t the number of documents t occurs in

Question 11

Q

PMI

Answer

A

Pointwise Mutual Information

PMI(w,c) = log_2(P(w,c)/P(w)P(c))

estimate of how much more the two words co-occur than we expect by chance

Question 12

Q

first-order co-occurrence

Answer

A

when two words are typically nearby each other (“wrote” and “book”)

Question 13

Q

second-order co-occurrence

Answer

A

when two words have similar neighbours (“wrote” and “said”)

Question 14

Q

distibutional hypothesis

Answer

A

words that occur in similar contexts tend to be similar (the context of a word defines its meaning)

Question 15

Q

proposed neural architectures for computing word vectors

Answer

A

word2vec
GloVe
FastText
ELMo
BERT

C3 Flashcards

(15 cards)