Sentence Segmentation Flashcards
Describe the task of sentence segmentation
determine boundaries between sentences
Describe the ambiguity problem in sentence segmentation
There are many different eos characters and styles.
e.g. “xyz”. in UK but “xyz.” in America
If we are thinking of medical writing, a sentence may not begin with a capital letter if it is a protein or something
Describe the approaches to sentence segmentation (5)
- regular expressions
- dictionaries of abbreviations
- hand crafted rules
- statistical and ml
- hybrid
Give 2 example rules for sentence segmentation
- Check first character after potential eos
- dictionary of abbreviations
What tools are available for sentence segmentation
openNLP sentence detection (ml based)
Spacy (ml or rule based)
What is domain dependence
This is the idea that the use of language is different based on its domain, e.g. medical writing