Lexical And Vector Semantics Flashcards
Semantics
Linguistic or logical study of meaning
Lexical Semantics
Is the linguistic study of word meaning
Word Lemma
Dictionary form of a word
Word Senses
A word can represent multiple meanings or concepts
Word Sense Disambiguation
A method to understand the meaning of a word in a specific context
Wordform
Is a specific form of lemma, sing is a lemma, sing, sung and sang are all word forms
Synonyms
Words whose sense is identical but are different words
Word Similarity
Is where two or more words have similar relationships but are not necessarily synonyms
Word Relatedness or Association
Words that share common connection such as context, but are not similar.
E.g, tea and cup
Semantic Field
Set of related words from a domain
Topic models
Can learn automatically associations between words
Semantic Frame
A set of words including perspectives or participants of a particular event
How can semantic frames change?
They can change from perspective
What is sentiment analysis?
Labelling positive or negative meanings to words and sentences
What is Representational Learning?
The automated learning of useful representations of text
What are Vector Semantics?
The use of embedding to represent word meaning
What are embeddings in relation to vector semantics?
Vectors representing words in a multidimensional space
Example of a sparse embedding?
TF-IDF
Example of a dense embedding
word2vec
What is a term document matrix?
It is a common way to display words within a sparse embedding
In a term-document matrix, what is a vector?
It is an array of numbers representing the word frequencies
In a term-document matrix, what is a vector space?
This is a collection of the vectors
What is information retrieval?
It is finding a document that matches a set of query terms. Documents and queries are represented as vectors and similarity is calculated
What does a term-term matrix show?
It shows the number of times a word co-occurs within a specified context window with another word
What is the dot product equation to find similarity between two vectors?
What is the problem with the dot product?
It favours longer vectors
What is the equation of the normalised dot product (cosine similarity)?
What does TF-IDF stand for?
Term Frequency-Inverse Document Frequency
What does Term Frequency (TF) mean?
It is the number of times a term occurs in a corpus but is not a good discriminator
What does Document Frequency (DF) mean?
It is the number of documents a term appears in - Inverse DF is the fraction of total documents a term appears in
What is the equation for term frequency?
What is the equation for IDF?
Why is TF-IDF good?
It is a balance between TF and IDF as TF does not discriminate well, and IDF alone picks terms that hardly ever occur
What is the equation for TF-IDF?
What does PMI stand for?
PMI stands for Pointwise Mutual Information
What does PMI do?
It compares how often words co-occur against what we would expect if they were indpendent
What is the equation for PMI?
What does a positive and negative PMI mean?
Positive means they occur more often than if they were independent, negative meaning they occur less often - but negative is unreliable unless you have a large corpus
What is Positive PMI (PPMI)?
It replaces negative values with zero
What is the equation of PPMI?
Is indirect or direct evaluation better for vector models?
Indirect models, using a task-specific performance metric with a better ground truth
What are some direct evaluation methods for vector models?
Correlation of word similarity to human ratings (global)
Correlation of word similarity to human ratings (per scenario)
Analogy task
Average over multiple embeddings