Vector Space Model Flashcards

1
Q

Vector Space

A

Defined by a linearly independent set of basis vectors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Orthogonal base vectors

A

v.w = 0 so v and w are orthogonal. If a set of vectors is pairwise orthogonal then it is linearly independent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Terms as Basis Vectors

A

Terms chosen as orthogonal base vectors but clearly not orthogonal due to polysemy and synonymy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Zipf’s Law

A

The frequency of a word is reciprocally proportional to its frequency:

freq(word_i) = 1/(i^theta) * freq(word_1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Term Importance and Zipf’s law

A

Zone 1 - High frequency words are function words so not important

Zone 2 - Mid-frequency words are best indicators of document contents.

Zone 3 - Low frequency words are generally typos or overly specific words.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Term Frequency

A

Monotonic function of number of times a term appears in a document. (enhances recall)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Inverse document frequency

A

Monotonic function of the number of documents in which a term appears. Use instead of inverse collection frequency as a term may have a high collection frequency by being concentrated in a small set of documents.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Term similarity metrics

A

Terms can be compared using a string similarity metric such as edit distance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Determining multi-word terms

A

Observe combinations in a large corpus of text then extract multi-word terms on frequency of occurrence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly