Vector Spaces Flashcards

1
Q

Distributional hypothesis

A

If two words have similar contexts, we can assume that they have similar meanings.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

A distributional approach to lexical semantics

A
  • Record contexts of words across a large collection of texts (corpus)
  • Each word is represented by a set of features
  • Each feature records some property of the observed context
  • Words that are found to have similar contexts are expected to also have similar meaning.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Context windows

A

​”I bake bread for breakfast”

  • Context = neighborhood of +-n words left/right of the focus word.
  • Features for +-1: {left: bake, right: for}
  • Some variants: distance weighting, ngrams.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Bag of Words (BoW)

A
  • Context: all co-occuring words, ignoring the linear ordering
  • Features: {I, bake, for, breakfast}
  • Some variants: Sentence-level, document-level
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Grammatical context

A
  • Context: The grammatical relations to other words
  • Intuition: When words combine in a construction they often impose semantic constraints on each-other.
  • Requires deeper linguistic analysis than simple BoW-approaches
  • Features: {dir_obj(bake), prep_for(breakfast)}
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Tokenization

A

Splitting a word into sentences and words or other units.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Stop-list

A

Filter out closed-class words or function words. The idea is that only content words provivde relevant context

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Lemmatized string from the raw string “The programmer’s program had been programmed”.

A

“The programmer ‘s program have be program”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

“Relatedness” vs. “Sameness”

A

Similarity in domain:
{car, gas, road, service, traffic, driver}

Similarity in content:
{car, train, bicycle, truck, vehicle, airplane}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Vector space model

A
  • A general model for representing data based on a spatial metaphor
  • Each object is represented as a vector (or point) positioned in a coordinate system
  • Each coordinate (or dimension) of the space corresponds to some descriptive and measurable property (feature) of the objects.
  • To measure similarity of two objects, we can measure their geometrical distance / closeness in the model.
  • Vector representations are foundational to a wide range of ML methods.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Semantic spaces

A

Semantic spaces AKA distributional semantic models or word space models

A semantic space is a vector space model where:

  • Points represent words
  • Dimensions represent contexts of use
  • Distance in the space represents semantic similarity.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

One standard metric for spatial proximity

A

Euclidian distance:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Norm of a vector

A

Norm of a vector

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Potential problem with Euclidian distance

A

It is very sensitive to extreme values and the length of the vectors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Overcome length biaz

A
  • Normalization to unit vectors
  • Cosine similarity
17
Q

Normalize vector to unit length

A

Divide vector by the length of the vector

18
Q
A
19
Q

Cosine of angle between vectors

A
20
Q

Cosine of angle between unit vectors

A
21
Q

The cosine measures ____ rather than distance

A

The cosine measures proximity rather than distance

22
Q

___ has the same relative rank order as the Euclidean distance for _____

A

Cosine measure has the same relative rank order as the Euclidean distance for unit vectors!