Semantic Analysis Flashcards
Types of Semantic Similarity
- Word similarity
- Sense similarity - some senses are more similar than others
- Text similarity
- Taxonomy similarity
- Frame similarity
- Context similarity (nurse and doctor have similar contexts even though don’t mean the same thing)
Word Similarity
When are two words functioning in a similar way (can be interchangeable).
Statistical approach - use statistics to see how closely associated are two words in a corpus
- PPMI (“positive pointwise mutual information”)
- Vector semantics and LSA (“latent semantic analysis”)
- Cosine similarity
Structural Approach
- Ontological distance
- Overlap of parse contexts
Dependent word probabilities
Jack & Jill example, probability of Jack and Jill co-occurrence. Dependent higher than independent probability. When dependent is higher than there is some kind of relationship between them.
PPMI, sometimes referred to as PMI
Co-occurrence
Pointwise Mutual Independence
PMI(X,Y) = log2 =P(x, y)/P(x)P(y)
0 or lower through it out, not similar so change negative to zerio
Vector semantics, aka distributional semantics
A word is characterized by the company it keeps
- Feature vectors are rows with a lot of columns
- Distribution of features
- Documents are similar if their vectors are similar -(Julius Caesar and Henry V have battle and soldiers more often than fool and clown) going down
- Fool and clown are close to each other because they both occur often in As You Like It and Twelfth Night.
- Term-document matrix
- Turn Term Frequencies into TF-IDF grids (or vectors?)
Latent Semantic Analysis (LSA)
Algorithms to reduce dimensions in the vector space
- 50,000 dimensions and reduce to 300 or 400
- Singular value decomposition
- Feed in sparse, gives back dense
- Garbage in, garbage out applies
Cosine similarity
Looks at angles between vectors, the smaller the angle, the more similar the words, are directionally similar
Document similiarity
The heart and soul of NLP
Jaccard similarity
How many terms the two documents share:
overlap is words they have in incommon
Cosine Similarity
Efficient on sparse vectors
Autoadjects for documents of different links
Document similarity is the angle
Dot product multiple every feature in doc 1 against doc2 and then addd
Why don’t we want term frequency?
Why is TF-IDF so important?
Documents as Probability Distribution
Helliger distance:
BM25
Wordnets, get synsets and documents that are similar but don’t have many of the same words. Replace
Probability
20% chance that randomly selecting a word
Similarity does not equal synonymy
-collocation in key terms “boot” + “camp” vs. “bootcamp” which are similar but not synonymous
Word2Vec
Lightweight neural network model
- one hidden layer
- one output layer
- 8 years old
- skip-gram and CBOW
- must have to know
Skip-gram
Opposite of CBOW model
Focus word is the single input vector, and the target context words are the output layer
Works well with a small amount of training data (<100k samples)
-represents rare words and phrases well