Levels of Language Analysis (Week 1 – 6) Flashcards
Levels (In Order)
Phonetic
Morphological
Lexical
Syntactic
Semantic
Discourse
Pragmatic
Phonetic Analysis
Phonetic analysis refers to the ability to recognize sound/ symbol relationships in order to identify a word. This involves a knowledge of the phonological patterns of the language, knowledge
The focus is on the analysis and processing of the smallest units of sound in a language, which are known as phonemes.
Morphological Analysis 1/3
Analysis at this step: Stemming and lemmatization
- Deals with the componential nature of lexical entities:
Morpheme: A morpheme is a minimal subunit of meaning in a word
We can usefully divide morphemes into two classes
Stems: The core meaning-bearing units (e.g., happy)
Affixes: Bits and pieces that adhere to stems to change their meanings and grammatical functions: prefixes, infixes, suffixes, circumfixes (e.g., unhappy)
Morphological Analysis 2/3
pre - registra - tion
prefix stem/root suffix
Inflection: the combination of stems and affixes where the resulting word has the same word type (e.g., noun, verb, etc.) as the original. Serves a grammatical purpose that is
different from the original but is nevertheless transparently related to the original.
Examples: apple – noun; apples – still a noun
Derivation: creates a new word by changing the category and/or meaning of the base to which it applies. Can change the grammatical category (part of speech)
Morphological Analysis 3/3
What features do inflections reveal in English?
Verbs - tense & number
Nouns - single/plural
Adjectives - comparison features
Stemming
* Strip prefixes and / or suffixes to find the base
root, which may or may not be an actual word
Lemmatization
* Strip prefixes and / or suffixes to find the base
root, which will always be an actual word
word Lemmatization Stemming
was be wa
studies study studi
Studying study study
Lexical Analysis 1/3
Analysis at this step: Word Type and Part-of-speech Tagging
Adding lexical class information to words:
Part-of-speech (POS) tagging tags words with specific noun, verb, adjective and
adverb types
Word Sense: the Meaning of a Word
Lexical Analysis 2/3
Analysis at this step: Word Type and Part-of-speech
Tagging
Adding lexical class information to words:
Part-of-speech (POS) tagging tags words with specific noun, verb, adjective and
adverb types
POS Tags
N - noun
V - verb
ADJ - adjective
ADV - adverb
P - preposition
PRO - pronoun
DET - determiner (word that introduces a noun)
Lexical Analysis 3/3
NNS - noun, plural RB - adverb
NNP - proper noun, singular RBR - adverb comparative
NNPS - proper noun, plural RBS - adverb superlative
PDT - predeterminer RP - particle
POS - possessive ending JJ - adjective
PRP - personal pronoun NN - noun
PRP$ possessive pronoun VB - verb
Syntactic Analysis (tree structure)
Analysis at this step: Parsing
Parsing is the process of finding a derivation (i. e. sequence of productions) leading from the START symbol to a TERMINAL symbol (or TERMINALS to START symbol)
- Analyzing of words in a sentence so as to uncover the grammatical structure of the sentence
- Requires both a grammar and a parser
S - NP - VP - PP
Syntactic Analysis (tree structure)
Modern parsing algorithms solve 3 problems :
1. Solve the problem of performance with chart parsers that use a special data structure (i.e., chart) to get rid of the backtracking
2. Solve the problems of predefining CFG or other grammars by using Treebanks and statistical parsing. The main use of the Treebank is to provide the probabilities to inform the statistical parsers
3. Partially solve the problems of correctly choosing the best parse trees by using lexicalization (information about words from the Treebank)
Semantics
Analysis at this step: Sentiment Polarity Classification
Step 1 – Cleaning and Tokenization
Step 2 - Extracting Features
- Determining possible meanings of a sentence
- Semantic Relation Extraction
Semantic Role Labeling (SRL)
- In a sentence, a verb and its semantic roles form a proposition; the verb
can be called the predicate and the roles are known as arguments. - Given a target verb, the Semantic Role Labeling task is to identify and
label each semantic role present in the sentence.
Discourse Level 1/2
Analysis at this step: Topic Modeling
Examination of patterns in the given text(or corpus) at the semantic level by extracting topics from texts.
Topic: A list of words that occur in statistically meaningful ways (for the computer)
Text: Unstructured text such that no computer-readable annotations available that indicate the semantic meaning of the words in the text
Widely used algorithms are based on LDA (Latent Dirichlet Allocation)
Discourse Level 2/2
- Determining meaning in texts longer than a sentence
- Making connections between component sentences
- Anaphora resolution
- multi-sentence texts are not just concatenated sentences to be
interpreted singly - Documents may have distinct patterns in different sections: introduction,
conclusions, methodology, etc.
Pragmatics
- The purposeful use of language in situations
- A functional perspective
- Those aspects of language which require context for understanding
- Goal: explain how extra meaning is read into texts without actually being
encoded in them - Requires much world knowledge
- Understanding of intentions / plans / goals