MEANING VIA DISTRIBUTIONAL SEMANTICS Flashcards

Question 1

Q

What is the distributional semantic model

Answer

A

Basic idea:
- Generate a high,multi-dimensional feature vector to characterise the meaning of a linguistic item
- Subsequently, the semantic similarity between the linguistic items can be quantified in terms of vector similarity, using measures like cosine or inner product between vectors

Question 2

Q

What are types of linguistic terms

Answer

A

words or sub-words, phrases, text pieces (windows of words), sentences, documents, etc

Question 3

Q

What is the distributional hypothesis

Answer

A

suggests that one can infer the meaning of a word just by looking at the context it occurs in

Question 4

Q

How do distributive semantics assess linguistic terms

Answer

A

assumes that contextual information alone constitutes a viable representation of linguistic items, in contrast to formal linguistics and the formal theory of grammar

Question 5

Q

What is the Vector space model (VSM)

Answer

A

The simplest distributional semantic model
Can build:
-A document-term matrix
-A term-context occurrence matrix

Question 6

Q

What is a document-term matrix

Answer

A

Columns : documents
Rows : terms
Each cell has how many times a term appears in that document
Then calculate the similarity between words using the row vectors

Question 7

Q

What is a term-context occurrence

Answer

A

First define a term’s context (eg 3 word in front and 3 before a term)
columns : terms
rows : general vocabulary used

In each cell is how many times we see the vocab word in the context of the term
(co-occurrences of a term and a word from vocab within the context window)

Question 8

Q

How do we decide on a context window size

Answer

A

Shorter windows (1-3 words) - indicate we are focused on syntax
Larger window - want to capture semantics
To handle large context window size, a more expressive model, larger data and more computing are needed

Question 9

Q

What are Sparse word vectors

Answer

A

VSM creates high-dimensional sparse word vectors
-Very high dimensions, e.g., 20,000-50,000.
-bad for storage
-Very sparse with most elements equal to zero.

Question 10

Q

What are dense word vectors

Answer

A

We can also create low-dimensional dense vectors
-Comparatively lower dimensions, e.g., 50-1000.
-Mostly non-zero elements.

This gives the benefit of being:
-easier to be used as features in machine learning models,
-less noisy

Question 11

Q

What is Latent semantic indexing

Answer

A

A classical method for low-dimensional dense representation from a document-term matrix
Mathematically (in algebra), the method is just the singular value decomposition (SVD)

Question 12

Q

How to carry out latent semantic indexing

Answer

A

We decompose our document-term matrix into 3 matrices (SVD)
X = UDV
D is a square matrix with non-zero values only in the diagonal
U and V are the two dimensions

We can calculate document vector : UD
We can calculate term vector : VD

The dimension of these vectors is k, and we can choose which k to use - usually low value
e.g., 50-1000 given
20,000-50,000 terms

Question 13

Q

What is Truncated SVD

Answer

A

reduces the dimensionality of the original matrix by selecting a k value lower than the rank of the matrix
Useful when dealing with large, sparse matrices, as it allows for a more compact representation while retaining significant information
(can be applied to any sparse matrix not just document-term)
-may lose data

Question 14

Q

Choosing k (degrees of freedom)

Answer

A

If we get 3 repeating vector patterns in out document-term matrix this means we have 3 DOF and should chose k=3
If we chose a lower k value we will lose data (doing an approximation)
And we can chose a higher value to be more accurate but -> more computation, more storage space

Question 15

Q

What is predictive word embedding models

Answer

A

Calculate word embeddings by performing prediction tasks based on word co-occurrence information
1) Define your context.
2) Define prediction task

Eg to predict whether a word appears in a context of a target word (word2vec including two versions of CBOW and skip-gram)

Or to predict how many times a word appears in the context texts of a
target word (GloVe)

Question 16

Q

CBOW model

Answer

Study These Flashcards

A

Continuous bag of words model
We predict the target word from the context
({the,dog,with,a,child}, “played” class)

If the {the,dog,with,a,child} features predict the target class “played” then we have trained a good model
Can compare the prediction to the actual word (if known) and update the vectors

1) average the feature vectors of the context words
eg h = 1/5(Vthe + Vdog +…Vchild)
2) pass this h to a new linear classifier which assigns it to one of the word classes to find the word (using logistic regression)

Question 17

Q

Skip-gram model

Answer

Study These Flashcards

A

Predict the context words from the target word
The reverse of CBOW
Takes the target word feature vector
and use logistic regression to classify all the context words

Question 18

Q

What is a one hot binary vector

Answer

Study These Flashcards

A

where only one element is marked as “hot” or “on” (set to 1), and all other elements are “off” (set to 0)
eg 00010000..000
Used to represent categories or classes

Question 19

Q

What is the GloVe Model

Answer

Study These Flashcards

A

Uses a regression based method instead of classification (cbow, skipgram)
Uses the frequency that a word appears in another word’s context in a given text corpus
Applies a log function
Calculates the predicted frequency a word will appear in a context

Question 20

Q

Word vectors: what is clustering

Answer

Study These Flashcards

A

Word vectors can be employed to cluster words with similar meanings or semantic relationships
k-means (k number of clusters)

Question 21

Q

Word vectors: what is visualisation

Answer

Study These Flashcards

A

word vectors can be visualised in lower dimensional spaces
eg Words with similar meanings or usage patterns may appear close to each other in a 2D or 3D semantic space

Similary word vectors can be used to solve other NLP tasks in semantic space

Question 22

Q

What is word embedding

Answer

Study These Flashcards

A

The same as creating word vectors

MEANING VIA DISTRIBUTIONAL SEMANTICS Flashcards

(22 cards)