Introduction - Week 1 Flashcards by Ollie Ursell

What makes an application a language processing application

It requires the use of knowledge about human language

How well did you know this?

Not at all

Perfectly

Is Unix wc an example of a language processing application?

Yes, when it counts words
No, when it counts lines or bytes. Lines and bytes are computer artefacts, not linguistic entities

How well did you know this?

Not at all

Perfectly

Is google search an NLP application?

Yes, it uses knowledge about human languages

How well did you know this?

Not at all

Perfectly

Why is NLP hard?

Text is only structured for the human user, often almost fully unstructured for the machine, sometimes ‘semi-structured’ like html

How well did you know this?

Not at all

Perfectly

semi-structured text

Text that is partially structured for the machine like HTML

How well did you know this?

Not at all

Perfectly

Natural Language Processing (NLP)

Necessary steps for “understanding” a piece of data represented by a language

How well did you know this?

Not at all

Perfectly

NLP tasks (umbrella terms)

Text mining
Text analytics
Computational Linguistics
(human) language technology

How well did you know this?

Not at all

Perfectly

NLP Tasks and Applications

Information Retrieval
- Searching for relevant documents

Document classification
- Sorting documents into categories

Question answering
- Short answer for a question

Text summarisation
- Summarise a set of documents

Sentiment analysis
- Product reviews, Twitter, Hate crime detection

Machine translation
- One of the first motivations for NLP

Natural language generation
- For data to text

Authoring and marking tools
- Check spelling, grammar, style
- Automated marking of essays

Conversational Agents
- Dialogues, voice recognition, Text to speech, speech to Text

etc… (many many others)

How well did you know this?

Not at all

Perfectly

NLP main problems

Variability
Ambiguity

How well did you know this?

Not at all

Perfectly

Variability

Numerous ways to say the same thing

How well did you know this?

Not at all

Perfectly

Ambiguity

Words and sentences are often ambiguous, and can have multiple meanings

How well did you know this?

Not at all

Perfectly

Word-level ambiguity

Apple (company) or Apple (fruit)

How well did you know this?

Not at all

Perfectly

Sentence-level ambiguity

I made her duck (this has at least 5 meanings)

How well did you know this?

Not at all

Perfectly

Lexical Ambiguity

A word with multiple POS tags, e.g. Duck can be verb or noun

How well did you know this?

Not at all

Perfectly

Lexical-semantic ambiguity

A word with different senses
e.g. bank can be financial institution or part of countryside (river bank)

How well did you know this?

Not at all

Perfectly

Syntactic Ambiguity

Study These Flashcards

Ambiguity combing from possible word groupings

Parts of Speech (POS)

Study These Flashcards

nouns, verbs, adjectives, adverbs, pronouns, prepositions, auxiliaries, determiners

Open class words

Study These Flashcards

nouns, verbs, adjectives, adverbs

Closed class words

Study These Flashcards

pronouns, prepositions, auxiliaries, determiners

Attachment Ambiguity

Study These Flashcards

I saw the girl with the telescope
(did he see the girl through a telescope, or see the girl using the telescope?)

Coordination ambiguity

Study These Flashcards

Old men and women (are the women also old?)

Mother and baby in pram (is the mother in the pram?)

Local Ambiguity

Study These Flashcards

Police help dog bite victim (Are the police also biting the victim?)

Corpus

Study These Flashcards

a (large) collection of linguistic data
- May consist of written texts, spoken discourse, samples of spoken or written language

unannotated corpus

Study These Flashcards

raw text/speech

annotated (labelled) corpus

raw text/speech enhanced with linguistic information A repository of explicit linguistic information (added manually or automatically) e.g. specifying that "loves" in "Mary loves John" is 3rd person singular present tense form of a verb

Corpus annotation types

Grammatical (e.g. POS tags, noun / verb phrases) Semantic (e.g. person, drug) Pragmatics - language in use (e.g. conversation) Combined

Why do we need annotated corpora?

For training and evaluation Training: - Train linguists, language learners, etc... - Researchers (e.g. in language development) - NLP development: use ML/statistics to learn patterns from an annotated corpus NLP evaluation: - Compare NLP results (automated "annotations") with a manually coded "gold standard"

Do people agree on annotations?

No, sometimes very subjective and inconsistent, very difficult to get a gold standard corpus Some simple tasks are relatively consistent, e.g. what's the name of the lecture? Or inconsistent, e.g. sentiment analysis

Annotation agreement

Kappa - measures the agreement between two classifiers, who classify N items into C mutually exclusive categories k = (Pr(a) - Pr(e)) / (1-Pr(e)) Pr(a) = relative observed agreement among annotators Pr(e) = hypothetical probability of chance agreement - typically uses the observed data to calculate the probabilities of each observer randomly saying each category If in complete agreement then k=1, if completely disagree, other than what would be expected by chance (as defined by Pr(e)) then k=0

Precision

Fraction of retrieved documents that are relevant relevant items retrieved / retrieved items P(relevant|retrieved)

Recall

Fraction of relevant documents that are retrieved relevant items retrieved / relevant items P(retrieved | relevant)

F-measure

Weighted harmonic mean between Precision and Recall, trades-off the two F = 2PR/(P + R)

Cross-validation

Break up data into n folds - Equal positive and negative in each fold? For each fold: - Choose the fold as a temporary test set - Train on the other n-1 folds, compute performance on the test fold Report average performance of the n runs Sensible value for n might be 10

Introduction - Week 1 Flashcards

(33 cards)