Text pre-processing - Week 2 Flashcards
1
Q
Text pre-processing
A
Document level preparation
- Document conversion
- Language / Domain identification
Tokenisation
- Case Folding
Basic lexical pre-processing
- Lemmatisation
- Stemming
- Spelling corrections
2
Q
Case folding
A
Make everything the same case:
Parsnips -> parsnips
3
Q
Lemmatization of parsnips
A
parsnip
4
Q
Stemming of automated
A
automat