MEANING VIA DISTRIBUTIONAL SEMANTICS Flashcards

1
Q

What is the distributional semantic model

A

Basic idea:
- Generate a high,multi-dimensional feature vector to characterise the meaning of a linguistic item
- Subsequently, the semantic similarity between the linguistic items can be quantified in terms of vector similarity, using measures like cosine or inner product between vectors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are types of linguistic terms

A

words or sub-words, phrases, text pieces (windows of words), sentences, documents, etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the distributional hypothesis

A

suggests that one can infer the meaning of a word just by looking at the context it occurs in

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do distributive semantics assess linguistic terms

A

assumes that contextual information alone constitutes a viable representation of linguistic items, in contrast to formal linguistics and the formal theory of grammar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the Vector space model (VSM)

A

The simplest distributional semantic model
Can build:
-A document-term matrix
-A term-context occurrence matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a document-term matrix

A

Columns : documents
Rows : terms
Each cell has how many times a term appears in that document
Then calculate the similarity between words using the row vectors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a term-context occurrence

A

First define a term’s context (eg 3 word in front and 3 before a term)
columns : terms
rows : general vocabulary used

In each cell is how many times we see the vocab word in the context of the term
(co-occurrences of a term and a word from vocab within the context window)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do we decide on a context window size

A

Shorter windows (1-3 words) - indicate we are focused on syntax
Larger window - want to capture semantics
To handle large context window size, a more expressive model, larger data and more computing are needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are Sparse word vectors

A

VSM creates high-dimensional sparse word vectors
-Very high dimensions, e.g., 20,000-50,000.
-bad for storage
-Very sparse with most elements equal to zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are dense word vectors

A

We can also create low-dimensional dense vectors
-Comparatively lower dimensions, e.g., 50-1000.
-Mostly non-zero elements.

This gives the benefit of being:
-easier to be used as features in machine learning models,
-less noisy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Latent semantic indexing

A

A classical method for low-dimensional dense representation from a document-term matrix
Mathematically (in algebra), the method is just the singular value decomposition (SVD)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How to carry out latent semantic indexing

A

We decompose our document-term matrix into 3 matrices (SVD)
X = UDV
D is a square matrix with non-zero values only in the diagonal
U and V are the two dimensions

We can calculate document vector : UD
We can calculate term vector : VD

The dimension of these vectors is k, and we can choose which k to use - usually low value
e.g., 50-1000 given
20,000-50,000 terms

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Truncated SVD

A

reduces the dimensionality of the original matrix by selecting a k value lower than the rank of the matrix
Useful when dealing with large, sparse matrices, as it allows for a more compact representation while retaining significant information
(can be applied to any sparse matrix not just document-term)
-may lose data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Choosing k (degrees of freedom)

A

If we get 3 repeating vector patterns in out document-term matrix this means we have 3 DOF and should chose k=3
If we chose a lower k value we will lose data (doing an approximation)
And we can chose a higher value to be more accurate but -> more computation, more storage space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is predictive word embedding models

A

Calculate word embeddings by performing prediction tasks based on word co-occurrence information
1) Define your context.
2) Define prediction task

Eg to predict whether a word appears in a context of a target word (word2vec including two versions of CBOW and skip-gram)

Or to predict how many times a word appears in the context texts of a
target word (GloVe)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

CBOW model

A

Continuous bag of words model
We predict the target word from the context
({the,dog,with,a,child}, “played” class)

If the {the,dog,with,a,child} features predict the target class “played” then we have trained a good model
Can compare the prediction to the actual word (if known) and update the vectors

1) average the feature vectors of the context words
eg h = 1/5(Vthe + Vdog +…Vchild)
2) pass this h to a new linear classifier which assigns it to one of the word classes to find the word (using logistic regression)

17
Q

Skip-gram model

A

Predict the context words from the target word
The reverse of CBOW
Takes the target word feature vector
and use logistic regression to classify all the context words

18
Q

What is a one hot binary vector

A

where only one element is marked as “hot” or “on” (set to 1), and all other elements are “off” (set to 0)
eg 00010000..000
Used to represent categories or classes

19
Q

What is the GloVe Model

A

Uses a regression based method instead of classification (cbow, skipgram)
Uses the frequency that a word appears in another word’s context in a given text corpus
Applies a log function
Calculates the predicted frequency a word will appear in a context

20
Q

Word vectors: what is clustering

A

Word vectors can be employed to cluster words with similar meanings or semantic relationships
k-means (k number of clusters)

21
Q

Word vectors: what is visualisation

A

word vectors can be visualised in lower dimensional spaces
eg Words with similar meanings or usage patterns may appear close to each other in a 2D or 3D semantic space

Similary word vectors can be used to solve other NLP tasks in semantic space

22
Q

What is word embedding

A

The same as creating word vectors