W5 L1 meaning representation Flashcards
what are valence for connotation
valence: the pleasantness of the stimulus
what is arousal for word connotation
arousal: the intensity of emotion provoked by the
stimulus
what is dominance for word connotation
dominance: the degree of control exerted by the stimulus
what can we do with valence arousal and dominance
we can represent the meaning of words using these three numbers as a point in three dimensional space
what is distributional semantics
words that occur in a similar context tend to have similar meanings
you shall know a word by the company it keeps
how can we represent words mathetmatically
we can represent words as ‘vectors’ along different ‘connotations’
these connotations can be other words woah
how would similar words appear in vector representation
similar words should end up with siimlar representations
we can use a vector where the word’s meaning is distrbuted across multiple dimensions
and each dimenison represents some asspect of the meaning
how can word meanings be learned
they can be learned through the word’s co-occurences with the neighboring words
what are the steps in forming distributional vectors
suppose the whole vocaubualry consists of V words
build a VxV semantic space where target words = rows
and context words = columns
each cell records co-occurance counts of the two words within a context window
based on the co-occurance counts you can then build the actual vectors
what information is given to us with the co-occurance vectors
the distance between these vectors shows us the similarity of corresponding words
what is semantic similarity
unlike synonyms, semantic similarty is much more common, helps u measure to what extent the vibe of the words are smilar to each other
ie cat and dog are simanitically similar but not synnyomns
what can we use to measure semantic similarity
cosine similarity
or
manhattan
euclidan
dotproduct
how do distributional word vectors handle polysemy
they dont, they are non contextual
how can we use vectors to represent sentences
we need a way of composing them
can we normalize/ weight these vectors
notice these vectors are big numbers
are word vectors efficent
no they are so long and computationally expensive
instead of disbtrubutional vectors what can we do to derive the rpresentation of a phrase
we can use compositional semantics to derive the representation of a phrase by applying a function to the consituent word vectors
when weighting terms in the vector why do we use row normalisation instead of column
row is the simplest way of normalization but
if we normalized over the column it would scale terms across all documents, making each term equally important regardless of its relative frequency within a document. This could distort the significance of terms for specific documents, as the focus shifts to global term behavior rather than document-specific patterns
what is pointwise mutual information
a measure of how often two events occur (given some relation)
compared with what we would expect if they were independent
what is the pointwise mutal information function
log p(x,y)/p(x)p(y)
base 2
=
log p(x|y)
——–
p(x)
=
log p(y|x)
———-
p(y)
equal via bayes theorem
where x and y are two diff events
what happens to pmi if x and y are independent
pmi(x,y) = 0
similar to bayes theorem conditional probabiltiy moment
what is the diff between mutal information and pointwise mutal information
this is jsut a note that sometimes these two terms are used interrchangeably
what is the value range for pmi
ranges from negative to positive inifinity
what does negative pmi imply
two words occur less often than chance
why do we use pos pmi over neg pmi
neg pmi require a lot of data to be calcuated effectivly so positve pmi is cheaper