POS-Parsing 5 Flashcards

1
Q

What are the 8 POS tags in grammar school?

A

noun, verb, adjective, adverb, preposition, conjunction, pronoun, interjection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

The collection of POS tags used is called?

A

tagset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a tagset?

A

A tagset contains all part of speech tags used for a specific corpus and what the tags mean (e.g., VBD = verb in past tense)
• Tags are usually uppercase (DT, ADJ, VBD)
• Similar tags often share a prefix (e.g., V… = related to verbs)
• Tagsets are language-specific and corpus-specific
(e.g., Social media corpora have a tag for emotions)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Name two tagsets.

A

Penn Treebank Tagset and Universal Tagset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Name three difficulties of POS tagging?

A

A word can have multiple POS tags
Most of them are common words
Can be difficult even for experienced human labellers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are homonyms?

A

Two distinct words that have the same spelling are called homonyms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Sentences that can be derived by a grammar are in the formal language defined by that grammar, and are called?

A

grammatical sentences

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Sentences that cannot be derived by a given formal grammar are not in the language defined by that grammar and are referred to as?

A

ungrammatical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In linguistics, the use of formal languages to model natural languages
is called?

A

generative grammar

since the language is defined by the set of possible sentences “generated” by the grammar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is syntactic parsing?

A

the task of recognizing a sentence and assigning a syntactic structure to it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Name three types of parsing?

A

Constituency Parsing
Dependency Parsing
Syntactic Parsing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is constituency parsing?

A

Constituency parsing aims to extract a constituency-based parse tree
from a sentence that represents its syntactic structure according to a
phrase structure grammar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is dependency parsing?

A

Dependency grammars focuses on how words relate to other words
Dependency is a binary relation between a head (or: governor) and its dependents.
• The head of a sentence is usually the finite verb.
• Every other word in the sentence depends on it either directly or through a
path of dependencies
Caveat: there are multiple theories for dependency parsing that may yield different results!

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Why is syntactic parsing important?

Give 2 reasons.

A

• Grammar checking
• Understand the subject/main verb/object of a sentence; useful in
downstream tasks, e.g. question answering, information extraction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Chunking?

A

Chunking is a process of extracting phrases from unstructured text.
• E.g. Instead of just extracting simple tokens which may not represent the
actual meaning of the text, it is advisable to use phrases such as
“South Africa” as a single word instead of ‘South’ and ‘Africa’
separate words.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Give some reasons for chunking and areas it is used.

A
• For entity detection
• Proper names (e.g., Monty Python)
• Definite noun phrases (e.g., the knights who say “ni”)
• Sometimes also indefinite nouns or noun chunks
(e.g., every student or cats)
• Help multiple NLP tasks
• Information retrieval (search engines)
• Text classification
• Sentence simplification/paraphrase
• Summarisation
17
Q

What are named entities?

A

Named entities are definite noun phrases that refer to specific types of individuals, such as organizations, persons, dates, ..

18
Q

What is the goal of a named entity recognition (NER) systems? And what are NERs useful for?

A

is to identify all textual mentions of named entities
• Two steps
• Identify the boundaries of the NE (e.g., by NP-chunking)
• Identify the type of the NE (e.g., by Naïve Bayes classification)
• NER is useful for
• Information extraction
• Answering specific questions, e.g. “Who is the president of the US?”
• Instead of retrieving a whole sentence, just present the NE “Joe Biden”

19
Q

Give 4 problems associated with named entity recognition.

A

• Simple word lookup incorrectly identifies words as NE e.g. location discovery
• Lists with people names or organizations have poor coverage
e.g. Hard to keep up with new people or organizations
• Named entity terms are ambiguous
e.g. May and North are DATE and LOCATION, but can also be PERSON
• Further challenge: multi-word terms
e.g. Stanford University, …

20
Q

Why are POS tags helpful?

A

• Text to speech (how do we pronounce “abstract”, “lead”, “read”)
• Find phrases ( Article Adj* N à noun phrases)
• Input for downstream NLP tasks (e.g. parsing, chunking, named entity
recognition)

21
Q

Name three ways to design a POS tagger?

A

Idea 1:
• Collect a large dataset with sentences and their POS tags
• For each word, find its most likely POS tag
• For a new sentence, label each word with its most probable POS tag

Idea 2: Train a classifier
• Most probable POS tag of the word
• Prefixes: irreplaceable, unfortunate, inactive à strong clues for JJ
• Suffixes: fortunately, largely à a strong clue for RB (have exceptions, elderly)
• Capitalization: Meridian, USA, RHUL à a strong clue for NNP
• Other features, e.g. 35-year: digit-NN, a clue for JJ

Idea 3: Utilize contextual information
• Use POS tags of surrounding words as additional features

22
Q

What is syntactic constituency?

A

Syntactic constituency is the idea that groups of words can behave as
single units, or constituents.

23
Q

What is a context-free grammar?

A

A context-free grammar (CFG) consists of a set of rules or
productions, each of which expresses the ways that symbols of the
language can be grouped and ordered together, and a lexicon of
words and symbols.

24
Q

How would you identify named entities?

A
  • Simple solution: look up each word in an appropriate list of names
  • Doing this blindly has problems, e.g. with location discovery

Reading is also a place but can also be seen as reading a book.

25
Q

What is used for named entities and chunking in python?

A

spacy