NLP Flashcards

1
Q

What is NLP?

A

NLP is a subfield of AI which enables robots to analyze and comprehend human language, enabling them to carry out repetitive activities without human intervention. e.g. Text classification, Automatic summarization, Auto text completion, Sentiment analysis, Virtual assistants, Grammar and spell check

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Differentiate between Script Bot and Smart Bot.

A

SCRIPT BOT
- easy to make
- works around a script programmed into them
- limited functionality
- mostly free and easy to integrate into messaging platforms
- no or little language processing skills
- e.g. chatbots deployed for customer care
SMART BOT
- flexible and powerful
- work on bigger databases and other resources directly
- wide functionality
- learn with more data
- coding is required to take this up on board
- e.g. alexa, cortana, siri

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is sentiment analysis?

A

Process in NLP where AI systems analyze text to determine emotional tone or attitude expressed. Uses: understanding public opinion, mental health insights.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the complexities in NLP?

A
  • Arrangement of words and meaning
  • Different syntax, same semantics
  • Multiple meanings of a word
  • Perfect syntax, no meaning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Differentiate between Human language and Computer language.

A

HUMAN
- Communication is complex
- Human brain keeps on processing what it hears around
- Brain understands the signal if clear, otherwise asks for clarification
- Brain prioritize the sounds on one’s interest
COMPUTER
- Communication is basic and simple
- Understands the language of numbers
- Mistakes in typing will lead to errors and does not process

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is text normalization?

A

Helps in cleaning up the textual data in such a way that it comes down to a level where the complexity is lower than the actual data. The vocabulary is reduced and more consistent.
Steps involved:
1. sentence segmentation
2. tokenization
3. removing stop words, special characters and numbers
4. converting text to common case
5. stemming
6. lemmatization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is converting text to common case done?

A

It ensures that the case sensitivity of the machine does not consider same words as different just because of different cases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is corpus?

A

A collection of written text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is tokenization?

A

Token is a term used for any word, special character or number. Under tokenization, every word, special character and number is considered separately and each of them is now a token.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are stopwords?

A

Words which occur very frequently in the corpus but do not add any value to it. This depends on the corpus.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is stemming?

A

Process in which the affixes of the words are removed, and the words are converted into their base form. It does not take into account if the stemmed word is meaningful or not. It just removes the affixes hence it is faster. e.g. studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is lemmatization?

A

Make sure that the lemma is a word with meaning and hence takes longer to execute than stemming. e.g. studies

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the difference between stemming and lemmatization?

A

Stemming and lemmatization both are alternative processes to each other as the role of both the processes is the same: removal of affixes. But the difference between both of them is that in lemmatization, the word we get after affix removal (also known as lemma) is a meaningful one.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Bag of Words?

A

With all the processes, we have normalized our text to tokens which are the simplest form of words present in the corpus. Now, it is time to convert the tokens into number. For this, we would use the Bag of Words (BoW)
It’s a vocabulary of words for the corpus, the frequency of these words (the number of times it has occurred in the corpus)
1. Text normalization: Collection of data and pre-processing
2. Create dictionary
3. Create document vector for document 1
4. Create document vectors for all documents and create document vector table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is text summarization?

A

NLP technique that automatically generates a concise and coherent summary of a longer text while retaining its main ideas and key information.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

How can AI play a role in the sentiment analysis of human beings?

A

Sentiment analysis is a process in NLP where AI systems analyze text to determine emotional tone or attitude expressed. Used in understanding public opinion and mental health insights. The goal of sentiment analysis is to identify sentiment among several posts or even in the same post where emotion is not always explicitly expressed. Companies use NLP applications like sentiment analysis to identify emotions and sentiments online to help them understand what customers think about their products and services.