W5 L1 meaning representation Flashcards

1
Q

what are valence for connotation

A

valence: the pleasantness of the stimulus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what is arousal for word connotation

A

arousal: the intensity of emotion provoked by the
stimulus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is dominance for word connotation

A

dominance: the degree of control exerted by the stimulus

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what can we do with valence arousal and dominance

A

we can represent the meaning of words using these three numbers as a point in three dimensional space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

what is distributional semantics

A

words that occur in a similar context tend to have similar meanings

you shall know a word by the company it keeps

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

how can we represent words mathetmatically

A

we can represent words as ‘vectors’ along different ‘connotations’

these connotations can be other words woah

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how would similar words appear in vector representation

A

similar words should end up with siimlar representations

we can use a vector where the word’s meaning is distrbuted across multiple dimensions

and each dimenison represents some asspect of the meaning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

how can word meanings be learned

A

they can be learned through the word’s co-occurences with the neighboring words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are the steps in forming distributional vectors

A

suppose the whole vocaubualry consists of V words

build a VxV semantic space where target words = rows
and context words = columns

each cell records co-occurance counts of the two words within a context window

based on the co-occurance counts you can then build the actual vectors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what information is given to us with the co-occurance vectors

A

the distance between these vectors shows us the similarity of corresponding words

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is semantic similarity

A

unlike synonyms, semantic similarty is much more common, helps u measure to what extent the vibe of the words are smilar to each other

ie cat and dog are simanitically similar but not synnyomns

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what can we use to measure semantic similarity

A

cosine similarity

or
manhattan
euclidan
dotproduct

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how do distributional word vectors handle polysemy

A

they dont, they are non contextual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

how can we use vectors to represent sentences

A

we need a way of composing them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

can we normalize/ weight these vectors

A

notice these vectors are big numbers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

are word vectors efficent

A

no they are so long and computationally expensive

17
Q

instead of disbtrubutional vectors what can we do to derive the rpresentation of a phrase

A

we can use compositional semantics to derive the representation of a phrase by applying a function to the consituent word vectors

18
Q

when weighting terms in the vector why do we use row normalisation instead of column

A

row is the simplest way of normalization but

if we normalized over the column it would scale terms across all documents, making each term equally important regardless of its relative frequency within a document. This could distort the significance of terms for specific documents, as the focus shifts to global term behavior rather than document-specific patterns

19
Q

what is pointwise mutual information

A

a measure of how often two events occur (given some relation)

compared with what we would expect if they were independent

20
Q

what is the pointwise mutal information function

A

log p(x,y)/p(x)p(y)
base 2

=
log p(x|y)
——–
p(x)

=
log p(y|x)
———-
p(y)

equal via bayes theorem

where x and y are two diff events

21
Q

what happens to pmi if x and y are independent

A

pmi(x,y) = 0

similar to bayes theorem conditional probabiltiy moment

22
Q

what is the diff between mutal information and pointwise mutal information

A

this is jsut a note that sometimes these two terms are used interrchangeably

23
Q

what is the value range for pmi

A

ranges from negative to positive inifinity

24
Q

what does negative pmi imply

A

two words occur less often than chance

25
Q

why do we use pos pmi over neg pmi

A

neg pmi require a lot of data to be calcuated effectivly so positve pmi is cheaper