NLP Flashcards
What is NLP?
NLP is a subfield of AI which enables robots to analyze and comprehend human language, enabling them to carry out repetitive activities without human intervention. e.g. Text classification, Automatic summarization, Auto text completion, Sentiment analysis, Virtual assistants, Grammar and spell check
Differentiate between Script Bot and Smart Bot.
SCRIPT BOT
- easy to make
- works around a script programmed into them
- limited functionality
- mostly free and easy to integrate into messaging platforms
- no or little language processing skills
- e.g. chatbots deployed for customer care
SMART BOT
- flexible and powerful
- work on bigger databases and other resources directly
- wide functionality
- learn with more data
- coding is required to take this up on board
- e.g. alexa, cortana, siri
What is sentiment analysis?
Process in NLP where AI systems analyze text to determine emotional tone or attitude expressed. Uses: understanding public opinion, mental health insights.
What are the complexities in NLP?
- Arrangement of words and meaning
- Different syntax, same semantics
- Multiple meanings of a word
- Perfect syntax, no meaning
Differentiate between Human language and Computer language.
HUMAN
- Communication is complex
- Human brain keeps on processing what it hears around
- Brain understands the signal if clear, otherwise asks for clarification
- Brain prioritize the sounds on one’s interest
COMPUTER
- Communication is basic and simple
- Understands the language of numbers
- Mistakes in typing will lead to errors and does not process
What is text normalization?
Helps in cleaning up the textual data in such a way that it comes down to a level where the complexity is lower than the actual data. The vocabulary is reduced and more consistent.
Steps involved:
1. sentence segmentation
2. tokenization
3. removing stop words, special characters and numbers
4. converting text to common case
5. stemming
6. lemmatization
Why is converting text to common case done?
It ensures that the case sensitivity of the machine does not consider same words as different just because of different cases.
What is corpus?
A collection of written text
What is tokenization?
Token is a term used for any word, special character or number. Under tokenization, every word, special character and number is considered separately and each of them is now a token.
What are stopwords?
Words which occur very frequently in the corpus but do not add any value to it. This depends on the corpus.
What is stemming?
Process in which the affixes of the words are removed, and the words are converted into their base form. It does not take into account if the stemmed word is meaningful or not. It just removes the affixes hence it is faster. e.g. studies
What is lemmatization?
Make sure that the lemma is a word with meaning and hence takes longer to execute than stemming. e.g. studies
What is the difference between stemming and lemmatization?
Stemming and lemmatization both are alternative processes to each other as the role of both the processes is the same: removal of affixes. But the difference between both of them is that in lemmatization, the word we get after affix removal (also known as lemma) is a meaningful one.
What is Bag of Words?
With all the processes, we have normalized our text to tokens which are the simplest form of words present in the corpus. Now, it is time to convert the tokens into number. For this, we would use the Bag of Words (BoW)
It’s a vocabulary of words for the corpus, the frequency of these words (the number of times it has occurred in the corpus)
1. Text normalization: Collection of data and pre-processing
2. Create dictionary
3. Create document vector for document 1
4. Create document vectors for all documents and create document vector table
What is text summarization?
NLP technique that automatically generates a concise and coherent summary of a longer text while retaining its main ideas and key information.
How can AI play a role in the sentiment analysis of human beings?
Sentiment analysis is a process in NLP where AI systems analyze text to determine emotional tone or attitude expressed. Used in understanding public opinion and mental health insights. The goal of sentiment analysis is to identify sentiment among several posts or even in the same post where emotion is not always explicitly expressed. Companies use NLP applications like sentiment analysis to identify emotions and sentiments online to help them understand what customers think about their products and services.