Introduction - Week 1 Flashcards
What makes an application a language processing application
It requires the use of knowledge about human language
Is Unix wc an example of a language processing application?
Yes, when it counts words
No, when it counts lines or bytes. Lines and bytes are computer artefacts, not linguistic entities
Is google search an NLP application?
Yes, it uses knowledge about human languages
Why is NLP hard?
Text is only structured for the human user, often almost fully unstructured for the machine, sometimes ‘semi-structured’ like html
semi-structured text
Text that is partially structured for the machine like HTML
Natural Language Processing (NLP)
Necessary steps for “understanding” a piece of data represented by a language
NLP tasks (umbrella terms)
- Text mining
- Text analytics
- Computational Linguistics
- (human) language technology
NLP Tasks and Applications
Information Retrieval
- Searching for relevant documents
Document classification
- Sorting documents into categories
Question answering
- Short answer for a question
Text summarisation
- Summarise a set of documents
Sentiment analysis
- Product reviews, Twitter, Hate crime detection
Machine translation
- One of the first motivations for NLP
Natural language generation
- For data to text
Authoring and marking tools
- Check spelling, grammar, style
- Automated marking of essays
Conversational Agents
- Dialogues, voice recognition, Text to speech, speech to Text
etc… (many many others)
NLP main problems
Variability
Ambiguity
Variability
Numerous ways to say the same thing
Ambiguity
Words and sentences are often ambiguous, and can have multiple meanings
Word-level ambiguity
Apple (company) or Apple (fruit)
Sentence-level ambiguity
I made her duck (this has at least 5 meanings)
Lexical Ambiguity
A word with multiple POS tags, e.g. Duck can be verb or noun
Lexical-semantic ambiguity
A word with different senses
e.g. bank can be financial institution or part of countryside (river bank)
Syntactic Ambiguity
Ambiguity combing from possible word groupings
Parts of Speech (POS)
nouns, verbs, adjectives, adverbs, pronouns, prepositions, auxiliaries, determiners
Open class words
nouns, verbs, adjectives, adverbs
Closed class words
pronouns, prepositions, auxiliaries, determiners
Attachment Ambiguity
I saw the girl with the telescope
(did he see the girl through a telescope, or see the girl using the telescope?)
Coordination ambiguity
Old men and women (are the women also old?)
Mother and baby in pram (is the mother in the pram?)
Local Ambiguity
Police help dog bite victim (Are the police also biting the victim?)
Corpus
a (large) collection of linguistic data
- May consist of written texts, spoken discourse, samples of spoken or written language
unannotated corpus
raw text/speech
annotated (labelled) corpus
raw text/speech enhanced with linguistic information
A repository of explicit linguistic information (added manually or automatically)
e.g. specifying that “loves” in “Mary loves John” is 3rd person singular present tense form of a verb
Corpus annotation types
Grammatical (e.g. POS tags, noun / verb phrases)
Semantic (e.g. person, drug)
Pragmatics - language in use (e.g. conversation)
Combined
Why do we need annotated corpora?
For training and evaluation
Training:
- Train linguists, language learners, etc…
- Researchers (e.g. in language development)
- NLP development: use ML/statistics to learn patterns from an annotated corpus
NLP evaluation:
- Compare NLP results (automated “annotations”) with a manually coded “gold standard”
Do people agree on annotations?
No, sometimes very subjective and inconsistent, very difficult to get a gold standard corpus
Some simple tasks are relatively consistent, e.g. what’s the name of the lecture?
Or inconsistent, e.g. sentiment analysis
Annotation agreement
Kappa - measures the agreement between two classifiers, who classify N items into C mutually exclusive categories
k = (Pr(a) - Pr(e)) / (1-Pr(e))
Pr(a) = relative observed agreement among annotators
Pr(e) = hypothetical probability of chance agreement - typically uses the observed data to calculate the probabilities of each observer randomly saying each category
If in complete agreement then k=1, if completely disagree, other than what would be expected by chance (as defined by Pr(e)) then k=0
Precision
Fraction of retrieved documents that are relevant
relevant items retrieved / retrieved items
P(relevant|retrieved)
Recall
Fraction of relevant documents that are retrieved
relevant items retrieved / relevant items
P(retrieved | relevant)
F-measure
Weighted harmonic mean between Precision and Recall, trades-off the two
F = 2PR/(P + R)
Cross-validation
Break up data into n folds
- Equal positive and negative in each fold?
For each fold:
- Choose the fold as a temporary test set
- Train on the other n-1 folds, compute performance on the test fold
Report average performance of the n runs
Sensible value for n might be 10