Semantic Analysis Flashcards

Question 1

Q

Types of Semantic Similarity

Answer

A

Word similarity
Sense similarity - some senses are more similar than others
Text similarity
Taxonomy similarity
Frame similarity
Context similarity (nurse and doctor have similar contexts even though don’t mean the same thing)

Question 2

Q

Word Similarity

Answer

A

When are two words functioning in a similar way (can be interchangeable).

Statistical approach - use statistics to see how closely associated are two words in a corpus

PPMI (“positive pointwise mutual information”)
Vector semantics and LSA (“latent semantic analysis”)
Cosine similarity

Structural Approach

Ontological distance
Overlap of parse contexts

Question 3

Q

Dependent word probabilities

Answer

A

Jack & Jill example, probability of Jack and Jill co-occurrence. Dependent higher than independent probability. When dependent is higher than there is some kind of relationship between them.

Question 4

Q

PPMI, sometimes referred to as PMI

Answer

A

Co-occurrence
Pointwise Mutual Independence

PMI(X,Y) = log2 =P(x, y)/P(x)P(y)
0 or lower through it out, not similar so change negative to zerio

Question 5

Q

Vector semantics, aka distributional semantics

Answer

A

A word is characterized by the company it keeps

Feature vectors are rows with a lot of columns
Distribution of features
Documents are similar if their vectors are similar -(Julius Caesar and Henry V have battle and soldiers more often than fool and clown) going down
Fool and clown are close to each other because they both occur often in As You Like It and Twelfth Night.
Term-document matrix
Turn Term Frequencies into TF-IDF grids (or vectors?)

Question 6

Q

Latent Semantic Analysis (LSA)

Answer

A

Algorithms to reduce dimensions in the vector space

50,000 dimensions and reduce to 300 or 400
Singular value decomposition
Feed in sparse, gives back dense
Garbage in, garbage out applies

Question 7

Q

Cosine similarity

Answer

A

Looks at angles between vectors, the smaller the angle, the more similar the words, are directionally similar

Question 8

Q

Document similiarity

Answer

A

The heart and soul of NLP

Question 9

Q

Jaccard similarity

Answer

A

How many terms the two documents share:

overlap is words they have in incommon

Question 10

Q

Cosine Similarity

Answer

A

Efficient on sparse vectors
Autoadjects for documents of different links
Document similarity is the angle
Dot product multiple every feature in doc 1 against doc2 and then addd

Question 11

Q

Why don’t we want term frequency?

Answer

A

Why is TF-IDF so important?

Question 12

Q

Documents as Probability Distribution

Answer

A

Helliger distance:
BM25
Wordnets, get synsets and documents that are similar but don’t have many of the same words. Replace

Probability
20% chance that randomly selecting a word

Question 13

Q

Similarity does not equal synonymy

Answer

A

-collocation in key terms “boot” + “camp” vs. “bootcamp” which are similar but not synonymous

Question 14

Q

Word2Vec

Answer

A

Lightweight neural network model

one hidden layer
one output layer
8 years old
skip-gram and CBOW
must have to know

Question 15

Q

Skip-gram

Answer

A

Opposite of CBOW model
Focus word is the single input vector, and the target context words are the output layer
Works well with a small amount of training data (<100k samples)
-represents rare words and phrases well

Question 16

Q

CBOW

Answer

A

More scalable and faster, better accuracy for frequent words
input is context, output is the focus word

Question 17

Q

Doc2Vec

Answer

A

Enhancement to Word2Vec
PV-DM, Distributed Model, paragraph vector
PV-DBOW, Distributed Bag of Words version of Paragraph vector
Given a document, return document vector

Question 18

Q

Dynamic word embeddings

Answer

A

Need to reevaluate context over time
Embeddings have to be retrained to be up to date
Even if semantic meaning of a word stays the same, the words around the word will change.
Temporal word embeddings - learn temporal embeddings in all time slices concurrently, and apply regularization terms to smooth embedding changes across time.