Natural Language Processing Flashcards
tokenization
separating each instance (usually a relevant word) in a given character sequence usually the first part of nlp analysis
part-of-speech (pos) tagging
assigning a word type (noun/verb/etc.) to a token
dependency parsing
process of describing the relationship between tokens (subject/object/etc.), the grammatical structure of a sentence
‘rainy weather’ : weather is head, rainy is child, dependent
lemmatization
extracting the base forms of a word or token
ex. base of was is be
ex. base of cats is cat
sentence boundary detection (SBD)
finding and segmenting individual sentences
named-entity-recognition (NER)
named entity identification
entity chunking
entity extraction
location and classify named entities (richmond) into a category (city)
information extraction
automatically extracting structured information from unstructured or semi-structured data
similarity
comparing documents to see how similar they are to each other
text classification
assigning categorizes or labels to whole documents
rule based matching
find words and phases in a document, as well as the tokens and their relationships