Module Natural Language Processing Basics Flashcards
Parsing
NLP uses algorithms and methods like large language models (LLMs), statistical models, reasoning engines, machine learning, deep learning, and rule-based systems to process and analyze text. These techniques, called parsing, involve breaking down text or speech into smaller parts to classify them for NLP. Parsing includes syntactic parsing, where elements of natural language are analyzed to identify the underlying grammatical structure, and semantic parsing which derives meaning.
Syntactic parsing
semantic parsing
which derives meaning
Segmentation - Syntactic parsing
Larger texts are divided into smaller, meaningful chunks. Segmentation usually occurs at the end of sentences at punctuation marks to help organize text for further analysis.
Tokenization - Syntactic parsing
Sentences are split into individual words, called tokens. In the English language, tokenization is a fairly straightforward task because words are usually broken up by spaces. In languages like Thai or Chinese, tokenization is much more complicated and relies heavily on an understanding of vocabulary and morphology to accurately tokenize language.
Stemming - Syntactic parsing
Words are reduced to their root form, or stem. For example breaking, breaks, or unbreakable are all reduced to break. Stemming helps to reduce the variations of word forms, but, depending on context, it may not lead to the most accurate stem. Look at these two examples that use stemming:
“I’m going outside to rake leaves.”
Stem = leave
“He always leaves the key in the lock.”
Stem = leave
Lemmatization - Syntactic parsing
Similar to stemming, lemmatization reduces words to their root, but takes the part of speech into account to arrive at a much more valid root word, or lemma. Here are the same two examples using lemmatization:
“I’m going outside to rake leaves.”
Lemma = leaf
“He always leaves the key in the lock.”
Lemma = leave
Part of speech tagging - Syntactic parsing
Assigns grammatical labels or tags to each word based on its part of speech, such as a noun, adjective, verb, and so on. Part of speech tagging is an important function in NLP because it helps computers understand the syntax of a sentence.
Named entity recognition (NER) - Syntactic parsing
Uses algorithms to identify and classify named entities–like people, dates, places, organizations, and so on–in text to help with tasks like answering questions and information extraction.