Morphology and finite-state techniques Flashcards
Morpheme
Minimal information carrying units
Affixes
Morphemes which can only occur in conjunction with other morphemes.
Suffixes
Affixes which go after stem
Prefixes
Affixes which go before stem - just derivational in english.
Infixes
Affixes which occur inside the stem.
Circumfixes
Affixes which go around the stem.
Productive
The process of changing a word e.g. adding an affix, can be applied to new words such as adding s for plurals in english.
Inflectional Morphology
Concerns properties such as tense, aspect, number, person, gender, and case.
Derivational Morpholology
Using an affix to form a word on the basis of an existing word.
Spelling Rules
Largely concatenative with some exceptions. There are however regular phonological and spelling changes associated with affixation e.g. e-insertion rule for plurals if word ends with s, x or z.
Full-form lexicon
List all the inflected forms and treat derivational morphology as non-productive.
Stemming
Strip endings to retrieve stem.
Porter Stemmer
Uses a series of simple rules to strip endings.
Lemmatisation
Analysing the form into a stem and affixes so that the necessary syntactic (and potentially semantic) information can be associated with it.
Bidirectional
Can be used for analysis and generation.
Lexical information needed for high precision morphological processing
Affixes - plus associated information conveyed by the affix.
Irregular forms - associated information similar to that for affixes.
Stems with syntactic categories.
Why is a lexicon of regular stems necessary for high precision morphological analysis?
We need to avoid incorrectly analysing words e.g. analysing corpus as a plural of corpu. This also is important in cases where a word was historically derived but is now no longer found in the language.
Local Ambiguity
Ambiguity that will be resolved by the subsequent context.
Overgeneration
Generate all valid outputs but also some invalid outputs. This is often tolerable especially if only analysis is important.
Finite State Transducers
Map between two representations so each transition corresponds to a pair of characters. FST only accepts input if ends in accept state.
FSTs for Morphological Parsing
Morphology systems detect boundaries in analysis mode while in generation mode it can construct the correct string.
Probabilistic FSAs
Can augment the FSA with informatino about transition probabilities.