Semantic Analysis Flashcards

1
Q

Types of Semantic Similarity

A
  • Word similarity
  • Sense similarity - some senses are more similar than others
  • Text similarity
  • Taxonomy similarity
  • Frame similarity
  • Context similarity (nurse and doctor have similar contexts even though don’t mean the same thing)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Word Similarity

A

When are two words functioning in a similar way (can be interchangeable).

Statistical approach - use statistics to see how closely associated are two words in a corpus

  • PPMI (“positive pointwise mutual information”)
  • Vector semantics and LSA (“latent semantic analysis”)
  • Cosine similarity

Structural Approach

  • Ontological distance
  • Overlap of parse contexts
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Dependent word probabilities

A

Jack & Jill example, probability of Jack and Jill co-occurrence. Dependent higher than independent probability. When dependent is higher than there is some kind of relationship between them.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

PPMI, sometimes referred to as PMI

A

Co-occurrence
Pointwise Mutual Independence

PMI(X,Y) = log2 =P(x, y)/P(x)P(y)
0 or lower through it out, not similar so change negative to zerio

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Vector semantics, aka distributional semantics

A

A word is characterized by the company it keeps

  • Feature vectors are rows with a lot of columns
  • Distribution of features
  • Documents are similar if their vectors are similar -(Julius Caesar and Henry V have battle and soldiers more often than fool and clown) going down
  • Fool and clown are close to each other because they both occur often in As You Like It and Twelfth Night.
  • Term-document matrix
  • Turn Term Frequencies into TF-IDF grids (or vectors?)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Latent Semantic Analysis (LSA)

A

Algorithms to reduce dimensions in the vector space

  • 50,000 dimensions and reduce to 300 or 400
  • Singular value decomposition
  • Feed in sparse, gives back dense
  • Garbage in, garbage out applies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Cosine similarity

A

Looks at angles between vectors, the smaller the angle, the more similar the words, are directionally similar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Document similiarity

A

The heart and soul of NLP

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Jaccard similarity

A

How many terms the two documents share:

overlap is words they have in incommon

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Cosine Similarity

A

Efficient on sparse vectors
Autoadjects for documents of different links
Document similarity is the angle
Dot product multiple every feature in doc 1 against doc2 and then addd

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why don’t we want term frequency?

A

Why is TF-IDF so important?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Documents as Probability Distribution

A

Helliger distance:
BM25
Wordnets, get synsets and documents that are similar but don’t have many of the same words. Replace

Probability
20% chance that randomly selecting a word

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Similarity does not equal synonymy

A

-collocation in key terms “boot” + “camp” vs. “bootcamp” which are similar but not synonymous

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Word2Vec

A

Lightweight neural network model

  • one hidden layer
  • one output layer
  • 8 years old
  • skip-gram and CBOW
  • must have to know
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Skip-gram

A

Opposite of CBOW model
Focus word is the single input vector, and the target context words are the output layer
Works well with a small amount of training data (<100k samples)
-represents rare words and phrases well

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

CBOW

A
  • More scalable and faster, better accuracy for frequent words
  • input is context, output is the focus word
17
Q

Doc2Vec

A
  • Enhancement to Word2Vec
  • PV-DM, Distributed Model, paragraph vector
  • PV-DBOW, Distributed Bag of Words version of Paragraph vector
  • Given a document, return document vector
18
Q

Dynamic word embeddings

A
  • Need to reevaluate context over time
  • Embeddings have to be retrained to be up to date
  • Even if semantic meaning of a word stays the same, the words around the word will change.
  • Temporal word embeddings - learn temporal embeddings in all time slices concurrently, and apply regularization terms to smooth embedding changes across time.