Syntactic Analysis: Parsing Flashcards
Shallow parsing, aka light parsing, aka chunking
- Runs from shallowest to deepest
- Shallow parsing is the middle level
- The shallowest parsing is from sentence to POS tagging only
- Deepest is a full grammar tree, complicated tree
- Shows how some chunks relate to other chunks
- Full parse tree recognizes first every phrase, phrases can be made out of other phrases, then POS’s
- Shallow parse tree breaks out (or chunks) main noun and verb phrases and prepositional phrases
- Can run a full parse tree in python
- POS tagging is a prerequisite for shallow parsing
Why chunk?
- Full parse takes longer
- Full parse might not be needed
- Full parse might not be accurate on User Generated Content
- POS doesn’t group words together so sometimes we need the information about noun phrases, prepositional phrases, etc.
- POS to chunking is just a little more time so why not do it? You’re almost there anyway.
- Often will be doing NP chunking
User Generated Content (UGC)
-Written by non-professionals, Twitter, FB, reviews, Reddit, etc.
How to chunk?
-Create rules with RegEx for obvious tag patterns
-
Chinking
- Parts we don’t want included are called chinks
- NP chunker just pulls NP, everything else is a chink
Classifier-Based Chunkers
-TimBL, decision-tree based learning classifier
IOB Standard Annotation
Every word is a token
I: Inside a chunk
O: Outside a chunk (a chink)
B: Begins a new chunk
For annotating, followed by type of phrase, e.g., B-NP, I-NP, B-PP, etc.
Uses for chunking
-Names Entity Recognition
-Pull out NP chunks
-
Full Grammar Parsing
Two types:
1) Constituency Parse, Parse, parse trees, dendrogram, penultimate is a POS tag, interior nodes, middle layer, phrases,
2) Dependency DAG - Directed Acyclic Graph
Creating parse trees
Constituency parsers (90% of applications use these)
- CYK algorithm
- Stanford parser
- Link grammar parser
Dependency parser
- MST parser
- MALT parser
Uses for parse trees
How to choose dependency parser vs. constituency parser?
- Constituency is often based on language. Languages with strict word order rules like English and German
- Dependency parsers are easier to engineer for languages like Czech.
Type of application
- Thematic extraction or text mining - constituency parser - titles, summaries.
- Question-answering dependency parser. Tells us the object of a verb.
Combine lexical and syntactic analysis
Get themes
- Wordnet, get noun word sense, synonyms, hyponyms
- Wordnet, get verb sense, synonyms, troponyms
- Use parser to pull out noun phrases and verb phrases that has the words pulled from WordNet
Improving Sentiment Analysis
-Vocabulary based
-
Valence
A number, usually -1 to 1, that indicates how emotional a word is. 0 is neutral. -1 is most negative and 1 is most positive.
Add them up to find out emotional charge.
How does a parser help with this.