CHALLENGES IN NLP Flashcards
Main NLP Tasks
Information retrieval (searching through docs)
Document classification
Question answering
Text summarisation
Conversational agents
Many more emerging
What is Lexical ambiguity
A word with multiple POS tags
eg can be a verb or a noun
synonymy/ polysemy/ antonymy
I made her DUCK
What is Lexical-semantic Ambiguity
a word with different senses
bank : river bank or financial institution
What is Syntactic Ambiguity
ambiguity comes from word groupings
POS tags
tag we place on a word to define it lexically
noun/pronoun/verb
What is an Open class word
In POS tagging
nouns, verbs, adjectives, adverbs
What is a Closed class word
In POS tagging
pronouns, prepositions,
auxiliaries, determiners
What are features/attributes in word tagging
person, number, tense, aspect, voice
What is a grammar
The way we group words
What is a transitive verb
The verb requires a noun direct object
She read a book.
They painted the walls.
I love chocolate.
What is a di-transitive verb
The verb requires both a direct object and an indirect object to complete its meaning
He gave me a gift.
She sent him an invitation.
We bought her a necklace.
What is an action-transitive verb
The verb requires a direct object and another verb
I caused the girl to flinch
What is attachment ambiguity
A type of syntactic ambiguity
Where do we attach?
Unclear which part of a sentence a particular word or phrase is grammatically connected to
I saw the man with the telescope
What is Coordination ambiguity
A type of syntactic ambiguity
uncertainty about how phrases or clauses are grouped together
Old men and women
Mother and baby in pram hit by car
What is local ambiguity
A type of syntactic ambiguity
a particular word or phrase in a sentence is unclear or ambiguous in its interpretation within its immediate context
Police help dog bite victim
The old man the boat
What are the main groupings of challenges in NLP
Structuring
Variability of expressions
Ambiguity
Large volume of Data
“Special” expressions
Context, interpretation
Spoken language - much harder when we cant even see the spelling (I or eye)
What are “special expressions”
idioms, sarcasm, irony
What are the main 2 groups for language resources in NLP
Dictionaries of words/ phrases
Collections of text/speech (corpora)
What are dictionaries
Words come with meaing, pronunctiation and how to use it contextually
What are corpora
Language in use
(large) collections of linguistic data
may consist of written texts, spoken discourse, samples of spoken or written language
documents/ conversations/ essays
What are the types of corpora
Mono vs multi-lingual
General
Specialised
Parallel (two texts translation of each other)
What is an unannotated corpus
raw text/speech
What is annotated corpus
is a repository of explicit linguistic information (added manually or automatically)
enhanced with linguistic information
(tagging/types)
Inline:
We <VERB>walk</VERB> down the street.
Let’s take a <NOUN>walk</NOUN> in the park
Can range from very simple POS to heavily specialised
What are the 4 annotation types
Grammatical (e.g. POS tags, noun/verb phrases)
Semantic (e.g. person, drug)
Pragmatics – language in use (e.g. conversation)
Combined
What are the two main reasons for annotation
Training
Train linguists
Researchers in language development
NLP development
Evaluation
Compare NLP results with a manually coded gold standard
Annotation agreement
assess the consistency or agreement among multiple human annotators or raters