Lecture 6 Flashcards
Pragmatics and Discourse Analysis, spaCy / Neural Networks for NLP
Summary Of Discourse-Level NLP Tasks
Uncovering discourse structure
(discourse segmentations, discourse
relations, text coherence)
Uncovering document structure
* Recognizing known structure, for
example, abstracts
* Organizing documents according
to known structure
Conducting named entity resolution
across discourse elements
3
Discourse Segmentation
Documents are automatically separated into passages, sometimes called fragments, which are different discourse segments
* Discourse segments can inform semantic interpretation of
document
Discourse Segmentation - Techniques
Techniques to separate documents into passages include
* Rule-based systems based on clue words and phrases
* Probabilistic techniques to separate fragments and to identify discourse segments
* Lexical cohesion to identify fragments (TextTiling)
Lexical Chains: Semantically Related Words
words that refer to the original:
the “book” was taken. “It” was valuable.
Relatedness == Cohesion != Coherence
Any document can be viewed as a set
of lexical chains: Clusters of words
based on semantic similarity; but
chains by themselves do not
guarantee coherence
Relatedness == Cohesion != Coherence
A multi-sentence sequence becomes
more than a random set of
independent utterances:
* To the extent that semantically similar
noun phrases are used, or that
coreference connects noun phrases
across sentences (cohesion)
* And to the extent that dissimilar noun
phrases are “pragmatically” connected
through actions (coherence)
Relatedness == Cohesion != Coherence
Coherent
Locally Incoherent
Topically Incoherent
Discourse Structure
Human discourse often exhibits structures that are intended to indicate common experiences and respond to them
Discourse Relations
How adjacent text segments are logically connected to each other. The rhetorical
structure of the text
Rhetorical Structure Theory
– a theory of text organization created in the 1980s
* Text units as nuclear and satellites
* Three categories of relations:
subject matter relations,
presentational relations,
multinuclear relations
Discourse Markers
Many rhetorical relations can be indicated by particular words or
phrases (from Biran and Rambow, 2012)
* But many of these words are ambiguous as they can be used for
other functions in text.
Entity Resolution
is an ability of a system to recognize and unify variant references to a single entity.
Coreference: A Critical Discourse Level Task
Anaphora - references (he, his, there) to previous text:
* “Doctor Foster went to Gloucester in a shower of rain.
He stepped in a puddle right up to his middle and never went there again.
Coreference: A Critical Discourse Level Task
Cataphora - references to future text:
* If you want them, there are cookies in the kitchen
Coreference: A Critical Discourse Level Task
Substitution - a more general word serves same function as the item for which it is substituted.
* These biscuits are stale. Get some fresh ones.