Natural Language Processing Flashcards
Why does NLP fall under Speech Recognition?
Application 2:
-Speech recognizer normally takes your input signal as audio and converts that to Text and recognizes what is spoken.
-Speech recognizer has a language moduling involved that tells that probably the person has spoken this particular word in place of that word even though they sound the same. But as I see the sequence as it has spoken one of the word is more probable than the other one.
Application 3: Image Captioning- You give an image. You want the computer to type caption of the image.
How does this fall under NLP?
Image is 2 dimensional data. CNN is involved to encode the image. Once the image is encoded then we need NLP to generate the caption to generate text for it. NLP models are there to do that.
If we have a product that can converse one text to another language they all fall under NLP and goes to very deep Neural Networks.
Machine Translation: Want to convert from english to french.
How is this done?
Neural Machine Translation- Most fundamental product and most researched area in NLP.
______ a language to specify the rules for the set of possible strings that you want to search in a corpus (large body) of text.
a. regular expression
b. natural expression
c. language
d. none of the above
a. regular expression
regular expression - a language to specify the rules for the set of possible strings that you want to search in a corpus (large body) of text.
Ex: “Your password must have at least 8 characters, at least 1 upper case letter, at least 1 lower case letter, at least 1 digit (number), and at least 1 symbol from the special symbols.”
This shows a specify of rules for a set of strings in a large body of text.
In regular expression, why do not use python program for searching?
a. it is slower
b. Regular expression uses a C wrap , not needed
c. Not wise, regular expression package already included.
d. only b and c
d. only b and c
Do not use Python Program for searching Regular Expression because:
b. Regular expression uses a C wrap , not needed
c. Not wise, regular expression package already included.
Why do Regular Expression not use String Functions?
a. not customized
b. regular expressions are much more abstract and faster
c. regular expressions are not a language specific thing, can do almost anything
d. only b and c
d. only b and c
Do not use String Functions in Regular Expressions because:
b. regular expressions are much more abstract and faster
c. regular expressions are not a language specific thing, can do almost anything
______ is a simple program that uses pattern matching to recognize phrases like “I need X” and translate them into subtle outputs like “What would it mean to you if you got X?”
a. Regular Expression
b. ELIZA chatbot
c. ChatGpt
d. None of the Above
b. ELIZA chatbot
ELIZA chatbot - is a simple program that uses pattern matching to recognize phrases like “I need X” and translate them into subtle outputs like “What would it mean to you if you got X?”
Example:
User: I am unhappy
ELIZA: Do you think coming here will help you not be unhappy?
User: I NEED SOME HELP, (I need x), that seems certain
ELIZA: What would it mean to you if you got some help (These are strings / outputs)
User: Perhaps I could learn to get along with my mother?
ELIZA: Tell me more about your family.
what does import re mean?
a. import regular expression
b. import regular eliza
c. import return expression
d. import return eliza
import re mean
a. import regular expression
what does the following code:
“re.findall(‘abc’, ‘askdfj;askabcdfahgfa;ghabc;hgabchkg;a’) mean?
a. regular expression, find all
b. find all, look for string abc
c. regular expression, find all, look for string abc in string asdlkfjalhgld.
the following code:
“re.findall(‘abc’, ‘askdfj;askabcdfahgfa;ghabc;hgabchkg;a’) mean
c. regular expression, find all, look for string abc in string asdlkfjalhgld.
re = regular expression
findall = find all
‘abc’ = what we are trying to look for
‘adklagfahga’ = string we are looking in
Which are meta characters with Special Meaning?
a. . ^ $ * + ?
b. . ^ $ * + ? { } [ ]
c. . ^ $ * + ? { } [ ] \ |
d. . ^ $ * + ? { } [ ] \ | ( )
Are meta characters with Special Meaning
d. . ^ $ * + ? { } [ ] \ | ( )
_____ used for specifying a class, which is a set of characters that you wish to match. Characters can be listed individually like [abcdef] or in a range ‘ like [a-f]?
a. [ ]
b. { }
c. \
d. ( )
a. [ ]
[ ] This metacharacter is used for specifying a class, which is a set of characters that you wish to match. Characters can be listed individually like [abcdef] or in a range ‘ like [a-f]
EXAMPLE:
re.findall (‘[abcd]’, ‘kasdf’
output: ‘a’, ‘d’,
K in kasdf is not searched because it is not in [‘abcd’ ]
a is searched and in output because is class findall. [‘abcd’ ]
s is not searched because it not class findall. [‘abcd’ ]
Using [ ], count the number of digits below:
‘2319ab4621acdz+*!’
i. #define your string
ii. #define a corrector class, [0-9]
iii. # find all occurrences out of this class using re.findall
iiii. # Use length will tell me how many digits that are there
a. 4,3,2,1
b. 1,2,3,4
c. 2,3,4,1
d. 3,2,4,1
k = ‘2319ab4621acdz+*!’
L=re.findall(‘[0-9]’,s)
print(len(L))
b. 1,2,3,4
1) #define your string
2) #define a corrector class, [0-9]
3) # find all occurrences out of this class using re.findall
4) # Use length will tell me how many digits that are there
Verify that 4 characters were printed consecutively.
if len(re.findall(‘[0-9][0-9][0-9][0-9]
‘asdfja;_+);1kj2306kjl891’))>0:
print(“Found”)
else:
print(“Not Found”)
1) if len(re.findall(‘[0-9][0-9][0-9][0-9]
*Finds all 4 characters consecutively why we have 4 different [0-9]
2) ‘asdfja;_+);1kj2306kjl891’))>0:
*This is our string
3) print(“Found”)
else:
print(“Not Found”)
*This says “Based on result, print Found or Not Found”
_____ is a Meta Character symbol or tool is used in regular expressions to set complements.
a. ^
b. [ ]
c. ( )
d. $
a. ^
^ symbol or tool is used in regular expressions to set complements.
[^023abf] what does the meta character ^ doing here?
a. Set complement saying everything except 023abf
b. includes all 023abf
c. exponent
d. Ordinary Complement
a. saying everything except 023abf
[023abf^] what does the meta character ^ doing here?
a. Set complement saying everything except 023abf
b. includes all 023abf
c. exponent
d. Ordinary Complement
d. Ordinary Complement
_______ is known as the “zero or more” quantifier. It indicates that the preceding character or expression can occur zero or more times in the input text. Here’s what it signifies:
Zero Occurrences: The character or expression preceding “*” may not occur at all in the text, and the pattern will still match.
Multiple Occurrences: If the character or expression occurs, it can occur any number of times (including zero).
a. $
b. *
c. ^
d. @
b. *
- is known as the “zero or more” quantifier. It indicates that the preceding character or expression can occur zero or more times in the input text. Here’s what it signifies:
Zero Occurrences: The character or expression preceding “*” may not occur at all in the text, and the pattern will still match.
Multiple Occurrences: If the character or expression occurs, it can occur any number of times (including zero).
What example of Meta Character is the following:
the regular expression “cats” matches both “cat” and “cats” in the input text.
a. $
b. *
c. ^
d. @
b. *
In this example, the regular expression “cats” matches both “cat” and “cats” in the input text. The “” allows for zero or more occurrences of the character “s”.
_____ is defined as the one that does not start with a digit and does not contain any special characters other than under score and it can have arbitrary number or characters.
a. proper variable name
b. regular expression
c. *
d. special sequence
a. proper variable name
A Proper Variable Name - is defined as the one that does not start with a digit and does not contain any special characters other than under score and it can have arbitrary number or characters.
____ applies repetitive pattern as long as it can go. the default behavior in most regex engines.
a. special sequence
b. proper variable name
c. greedy matching
d. *
c. greedy matching
Greedy Matching
1. applies repetitive pattern as long as it can go.
2. the default behavior in most regex engines.
These are the steps to do Greedy Matching
Example:
I have a cat named Saturn, and another cat named Saturnalia.
- Define Your Pattern: Start by defining the pattern you want to match in your regular expression. This pattern may include characters, groups, and quantifiers that specify how many times a character or group should be matched.
Our pattern will be “cat..”, which means we’re looking for the word “cat” followed by any characters (.) and ending with a period (.).
- Apply Quantifiers: Use quantifiers like “*”, “+”, “{n,}”, etc., to specify how many times a character or group should be matched. These quantifiers determine the greediness of the matching.
The .* part of the pattern is a greedy quantifier. It means that it will try to match as many characters as possible before satisfying the next part of the pattern (the period).
- Apply the Regular Expression: Apply your regular expression pattern to the input text you want to search through.
- Find Matches: Use a function like findall() (in Python’s re module) to find all matches of the pattern in the input text.
Use a function like findall() to find all matches of the pattern in the input text.
- Greedily Match: The regex engine will attempt to match as much of the input text as possible while still satisfying the overall pattern. This means it will try to match as many repetitions of the quantified elements as it can.
- Keep Matching Until Satisfied: The regex engine will keep trying to match more characters until it cannot match anymore without violating the pattern.
The regex engine will start by finding the first occurrence of “cat” and then try to match as many characters as possible until it finds a period.
- Backtrack if Necessary: If greediness causes the pattern to fail, the regex engine will backtrack and try different possibilities until it finds a match. This may involve matching fewer repetitions of a quantified element or taking a different path through the input text.
It continues to match characters until it hits the period, as it is trying to satisfy the pattern “cat.*.”.
- Retrieve Matches: Once all matches are found, retrieve and process them as needed for your application.
Since our pattern is greedy, the regex engine won’t backtrack until it finds a period. So, if there are multiple occurrences of “cat” in the text, it will keep matching characters until it finds a period for each occurrence.
These are the steps to do Greedy Matching
Example:
I have a cat named Saturn, and another cat named Saturnalia.
- Define Your Pattern: Start by defining the pattern you want to match in your regular expression. This pattern may include characters, groups, and quantifiers that specify how many times a character or group should be matched.
Our pattern will be “cat..”, which means we’re looking for the word “cat” followed by any characters (.) and ending with a period (.).
- Apply Quantifiers: Use quantifiers like “*”, “+”, “{n,}”, etc., to specify how many times a character or group should be matched. These quantifiers determine the greediness of the matching.
The .* part of the pattern is a greedy quantifier. It means that it will try to match as many characters as possible before satisfying the next part of the pattern (the period).
- Apply the Regular Expression: Apply your regular expression pattern to the input text you want to search through.
- Find Matches: Use a function like findall() (in Python’s re module) to find all matches of the pattern in the input text.
Use a function like findall() to find all matches of the pattern in the input text.
- Greedily Match: The regex engine will attempt to match as much of the input text as possible while still satisfying the overall pattern. This means it will try to match as many repetitions of the quantified elements as it can.
- Keep Matching Until Satisfied: The regex engine will keep trying to match more characters until it cannot match anymore without violating the pattern.
The regex engine will start by finding the first occurrence of “cat” and then try to match as many characters as possible until it finds a period.
- Backtrack if Necessary: If greediness causes the pattern to fail, the regex engine will backtrack and try different possibilities until it finds a match. This may involve matching fewer repetitions of a quantified element or taking a different path through the input text.
It continues to match characters until it hits the period, as it is trying to satisfy the pattern “cat.*.”.
- Retrieve Matches: Once all matches are found, retrieve and process them as needed for your application.
Since our pattern is greedy, the regex engine won’t backtrack until it finds a period. So, if there are multiple occurrences of “cat” in the text, it will keep matching characters until it finds a period for each occurrence.
___A meta character that ensures the entry at least 4 times.
example:
doc = ‘YahooYahoooYahooooYahooooYaho’
regExp = ‘Yahoo+’
rs = re.findall(regExp,doc)
print(rs)
output: [Yahoo, Yahoo, Yahoo,Yahoo]
a. *
b. +
c. ?
b. +
Difference between + and * is that + ensures 1 time whatever the pattern it and goes up to infinity.
What does this mean?
“doc = ‘YahooYaooYahoooo
regExp = ‘Yah?oo’
rs = re.findall(regExp,doc)
print(rs)
- doc = string of Yahoo
- regExp is Regular Expression saying that h? is optional in Yahoo
- rs = re.findall(regExp.doc)
Means that find all in Regular Exp. document
_______A metacharacter in regular expressions that specifies repeated patterns and also defines lower and upper limits
a. {,}
b. (,)
c. +
d. \
a. {,}
{ } A metacharacter in regular expressions that specifies repeated patterns and also defines lower and upper limits
[a-z] {2,5} means that want at least 2 ch. with highest 5 charaters
- ab (2 characters) so valid
- a,b,c,d (4 charac.) so valid
- dldgkd (6 charac.) not valid
{0} is the same as what metacharacter?
a. {,}
b. *
c. +
d. \
b. *
{0} = *
{1} is the same as what metacharacter?
a. {,}
b. *
c. +
d. \
c. +
{1} = +
{0,1} is the same as what metacharacter?
a. ?
b. *
c. +
d. \
a. ?
{0,1} = ?
match() means what in regular expression?
a. determines if the RE matches at the beginning of the string
b. scans through a string, looking for any location where this RE matches
c. neither
match()
a. determines if the RE matches at the beginning of the string
search() means what in regular expression?
a. determines if the RE matches at the beginning of the string
b. scans through a string, looking for any location where this RE matches
c. neither
search()
b. scans through a string, looking for any location where this RE matches
______ what regular expression means “logical or” used to join.
a. +
b. ()
c. |
d. *
c. |
means logical or that is used to join.
______ what regular expression matches at the end of a string, or any location followed by a newline character?
a. +
b. $
c. |
d. *
b. $
$ A regular expression matches at the end of a string, or any location followed by a newline character
______ what regular expression makes a group of characters to be treated just like a single character?
ex. want thethethethe or find repeatition
p=re.compile(‘(the)+’)
a. +
b. $
c. |
d. ()
d. ()
() is a regular expression that makes a group of characters to be treated just like a single character.
The () makes a group. If you want to find all the ‘the’ just group them all shown below. then search in doc
(the)+
m=p.search(doc)
(7, 22) thethethethethe
what does this mean?
It is the beginning and end
ex starts at 7 and ends at 22
______ splits the string into a list, splitting it wherever the RE matches.
a. split()
b. sub()
c. subn()
a. split()
split()- splits the string into a list, splitting it wherever the RE matches.
ex: abc, f12, 1349,a
if we want to split the result
output: [abc; f12; 1349; a]
______ finds all substrings where RE matches and replaces them with a different string.
a. split()
b. sub()
c. subn()
b. sub()
(also known as substitute)
sub() - finds all substrings where RE matches and replaces them with a different string.
______Does the same thing as sub() but returns with a new string and the number of replacements.
a. split()
b. sub()
c. subn()
c. subn()
Does the same thing as sub() but returns with a new string and the number of replacements.
p = re.compile(‘\W+’)
This is an example of a
a. word tokenizer
b. word spacer
c. w+
d. word compiler
a. word tokenizer
p = re.compile(‘\W+’)
_____ a field that focuses on software’s ability to understand and process human languages
a. NLP (natural language processing)
b. language compiler
c. word tokenizer
a. NLP (natural language processing)
-a field that focuses on software’s ability to understand and process human languages
_____ Spreading the text into tokens minimal meaningful units. This can be words, sentences. or sentences into words.
a. tokenization
b. parts of speech
c. stemming
a. tokenization
Spreading the text into tokens minimal meaningful units. This can be words, sentences. or sentences into words.
______ Assigning parts of speech to text
ex. noun, proverb, etc.
a. tokenization
b. parts of speech
c. stemming
b. parts of speech
Assigning parts of speech to text
ex. noun, proverb, etc.
______ process of reducing words to their stem.
ex. walking -> walk
a. tokenization
b. parts of speech
c. stemming
c. stemming
process of reducing words to their stem.
ex. walking -> walk
____ similar to stemming, operates by including word context, but includes “good or better”
a. tokenization
b. parts of speech
c. stemming
d. lemmatization
d. lemmatization
similar to stemming, operates by including word context, but includes “good or better”
_____ name entity recognition, labels the sequence of words of names of things.
ex. person, company, or street
a. tokenization
b. NER
c. stemming
b. NER
name entity recognition, labels the sequence of words of names of things.
ex. person, company, or street
____ analyze the grammer of the text to extract the same text form
a. tokenization
b. NER
c. stemming
d. parsing
d. parsing
analyze the grammer of the text to extract the same text form
spaCY- NER, tokenization, etc.
CoreNLP -
gensim- semantic analysis, clarity, efficiency
NLTK- Natural Language Token (Mother of all NLP libraries)
____ used for filtering information in web search. Helps avoid SPAM emails by classification.
a. text classification
b. classification
c. nlp classification
a. text classification
text classification - used for filtering information in web search. Helps avoid SPAM emails by classification.
____ identify opinions and sentiments of the audience. Understand emotions of audience via social media.
a. sentiment analysis
b. chatbots
c. classification
d. advertisement
a. sentiment analysis
Sentiment Analysis- identify opinions and sentiments of the audience. Understand emotions of audience via social media.
____ helps in customer support and assistance through low priority tasks. Also used in HR Systems like how many vacation days left.
a. chatbots
b. customer service
c. sentiment analysis
a. chatbots
helps in customer support and assistance through low priority tasks. Also used in HR Systems like how many vacation days left.
_______ offers insights into audience preferences and helps improve customer satisfaction
a. customer service
b. chatbots
c. sentiment analysis
a. customer service
offers insights into audience preferences and helps improve customer satisfaction
____ document summarization, machine translation, and speech recognition.
a. customer service
b. chatbots
c. sentiment analysis
d. natural language processing
d. natural language processing
offers document summarization, machine translation, and speech recognition.
Natural Language is is any language that has evolved naturally through use and repetition without conscious planning or premeditation
TRUE
FALSE
TRUE
Natural Language is is any language that has evolved naturally through use and repetition without conscious planning or premeditation
Natural Language is what humans use to communicate and it has evolved with human evolution
NLP is a science that focuses on : 1 - Grammar
2 - Translation
3 - Speech Recognition
4 - Software Ability to understand and proess human language
A. Software’s ability to understand and process human’s language
B. Speech Recognition
C. Grammar
D. Translation
NLP is a science that focuses on :
A. Software’s ability to understand and process human’s language
NLP has evolved as a science to build programs or software capable of understanding human language
NLTK stands for Natural Language Tool Kit
TRUE
FALSE
TRUE
NLTK stands for Natural Language Tool Kit
_____ process of breaking up text into smaller pieces (tokens)
a. tokenization
b. NER
c. stemming
d. parsing
a. tokenization
TOKENIZATION process of breaking up text into smaller pieces (tokens)
____ words that are commonly used. Language specific also. ex: ‘a’, ‘an’, ‘the’
a. stop words
b. tokenization
c. stemming
d. parsing
a. stop words
Stop words - words that are commonly used. Language specific also.
Tokenizing a sentences is assigning ids to each word
FALSE
TRUE
FALSE
Tokenizing a sentence is spliting it into tokens ( words )
POS tagging is the process of assigning tags to tokens (words) like nouns, verbs …
TRUE
FALSE
TRUE
POS taggig is the process of assigning part of speech tags to tokens (words)
Tags include noun, verb, adjective etc…
TF-IDF is used to :
A. Extract keywords or features from a Text
B. Find synonyms
C. Extract the root or lemma of a word
A. Extract keywords or features from a Text
TF-IDF is a technique used to find what are the dominant words or keywords in a text
______ Process of computationally classifying and categorizing opinions expressed in a piece of text.
Helps understand the writers opinion about a topic, event, product etc.
A. Sentiment Analysis
B. Find synonyms
C. Extract the root or lemma of a word
A. Sentiment Analysis
Process of computationally classifying and categorizing opinions expressed in a piece of text.
________is the first layer of the neural network.
______ Allows words with similar meanings to have similar representation.
A. Sentiment Analysis
B. Find synonyms
C. Extract the root or lemma of a word
D. Word Embeddings
D. Word Embeddings
________is the first layer of the neural network.
______ Allows words with similar meanings to have similar representation.
what does this code mean:
network = Sequential.
This shows that you are using a neural network and need to add layers in sequence.
The closer the sentiment analysis is to 0 what does that mean?
a. more positive result
b. more negative result
b. more negative result
Sentiment analysis closer to 0 is a more negative result.
The closer the sentiment analysis is to 1 what does that mean?
a. more positive result
b. more negative result
a. more positive result
Sentiment analysis closer to 1 is a more negative result.
Sentiment Analysis is a process to classify text into topics
FALSE
TRUE
FALSE
Sentiment Analysis is used to classify text based on the opinion or the sentiment of the writer: Negative or Positive.
It is a best practice to use all the dataset to train models
FALSE
TRUE
FALSE
Dataset must be split into Training and Test data in order to avoid wrong performance calculations.
True or False:
The basic mechanics of machine learning is to make computers act without being explicitly programmed to do so?
True
The basic mechanics of machine learning is to make computers act without being explicitly programmed to do so.
_____ has given us:
-Fraud detection
-Web search
-Self-Driving cars
-Online shopping recommendations
A. Machine Learning
B. Deep Learning
C. Reinforcement Learning
d. None of above
A. Machine Learning
Machine Learning has given us:
-Fraud detection
-Web search
-Self-Driving cars
-Online shopping recommendations
______ helps us take a picture
of what someone else wrote in a board and convert it into text.
ex. scan a doc and want to convert to word document
A. OCR (Optical Character Recognition)
B. Machine Learning
C. Deep Learning
D. Reinforcement Learning
A. OCR (Optical Character Recognition)
OCR (Optical Character Recognition) -
helps us take a picture
of what someone else wrote in a board and convert it into text.
ex. scan a doc and want to convert to word document
_____ type uses applications:
-Facebook news feed
-Self-Driving cars
-Virtual personal assistant
-Email spams
-Online customer support
A. OCR (Optical Character Recognition)
B. Machine Learning
C. Deep Learning
D. Reinforcement Learning
B. Machine Learning
Machine Learning applications include:
-Facebook news feed
-Self-Driving cars
-Virtual personal assistant
-Email spams
-Online customer support
______ Mainly used for classification problems, repick the most significant attribute and then splits them creating a tree like structure
a. Decision Tree
b. Logistic Regression
c. Linear Regression
d. Naive Bayes
a. Decision Tree
Decision Trees - Mainly used for classification problems, repick the most significant attribute and then splits them creating a tree like structure
______ Uses data we have learned in the past and and applies what is learned on new data. It starts on dataset - Train- model.
Also compare output if it correct or not to improvement.
a. supervised
b. unsupervised learning
a. supervised - Uses data we have learned in the past and and applies what is learned on new data. It starts on dataset - Train- model.
_________ if dataset is not labeled, categorized, or configured. Finds a hidden parent or structure from unlabeled data based on similarities.
a. supervised
b. unsupervised learning
unsupervised learning - if dataset is not labeled, categorized, or configured. Finds a hidden parent or structure from unlabeled data based on similarities.
_______ statistical approach to find relationshp between variables. Predicts outcome from input based on relationship between variables extracted or obtained from dataset.
a. linear regression
b. logistic regression
a. linear regression
Linear Regression - statistical approach to find relationshp between variables. Predicts outcome from input based on relationship between variables extracted or obtained from dataset.
________ also a statistical method used to predict binary outcome, Yes / No, 0/ 1, True or False given independent variables. When outcome variable is configurable.
Ex. if a transaction to be spam or not.
a. linear regression
b. logistic regression
b. logistic regression
Logistic Regression - also a statistical method used to predict binary outcome, Yes / No, 0/ 1, True or False given independent variables. When outcome variable is configurable.
Ex. if a transaction to be spam or not.
________ Useful for large datasets, can outperform even highly sophisticated classification methods. Form a family of simple probablistic classifiers. All attributes are independent.
ex. Orange is round, certain size, and color. But would not assume these things all at once.
A. Decision tree
b Naive Bayes
b Naive Bayes
Naive Bayes - Useful for large datasets, can outperform even highly sophisticated classification methods. Form a family of simple probablistic classifiers. All attributes are independent.
ex. Orange is round, certain size, and color. But would not assume these things all at once.
_____ - process of predicting the class or category of a given input / data. A program will learn from train in dataset. It can be bi-class (ex. male or female). Sentiment analyzer is an example of a bi-class classifier.
ex. if person is male or female
or if email is spam or not.
a. classification
b. decistion tree
c. naive bayes
a. classification
CLASSIFICATION - process of predicting the class or category of a given input / data. A program will learn from train in dataset. It can be bi-class (ex. male or female). Sentiment analyzer is an example of a bi-class classifier.
True or False:
Classification can be used in
A. Bi-Class (Ex.: Male or Female
B. Multi-Class (what type of fruit in pict)
(what type of text is article talking about)
True:
Classification can be used in
A. Bi-Class
B. Multi-Class
_______ classifies textual information into categories. We want to know what people are talking about and what are people’s opinion?
a. classification
b. decistion tree
c. naive bayes
d. text classification
d. text classification
Text classification - classifies textual information into categories. We want to know what people are talking about and what are people’s opinion
____ used to organize, structure, or organize into classes?
a. classification
b. decistion tree
c. naive bayes
d. text classification
d. text classification
Text Classification - used to organize, structure, or organize into classes?
Provide steps in Classification
i. Feature extraction (ex. sentiment analyzer in keras) and transform into math representation in the form of vectors.
ii. Labels: (ex. sunny, machine, learning) represent as 1, 1, 0 \
iii. Goes into training and text is analyzed.
iv. Model created and tested.
a. 1,2,3,4
b. 4,3,2,1
c. 2,1,3,4
a. 1,2,3,4
Classification Steps:
i. Feature extraction (ex. sentiment analyzer in keras) and transform into math representation in the form of vectors.
ii. Labels: (ex. sunny, machine, learning) represent as 1, 1, 0 \
iii. Goes into training and text is analyzed.
iv. Model created and tested.
Steps to Pre-Process Dataset for Classification
i. pre-process the data to get Dataset
(use scikit learn)
ii. Get the training and test data subjects
iii. check out categories names
iv. printing a single ost
v. extracting features
vi. calculating TF-IDF
a. 1,2,3,4,5,6
b. 2,4,6,1,3,5
c. 6,5,4,3,2,1
d. none of above
a. 1,2,3,4,5,6
Steps to Pre-Process Dataset for Classification
i. pre-process the data to get Dataset
(use scikit learn)
ii. Get the training and test data subjects
iii. check out categories names
iv. printing a single ost
v. extracting features
vi. calculating TF-IDF
________ has multinominal and gaussian variables. Multinomilan for multinomial data- used for text classification (ex. word counts for text classificatin)
a. naive bayes
b. SVM
c. Multinomial
a. naive bayes
Multinomilan for multinomial data- used for text classification (ex. word counts for text classificatin)
_____ for multinomial data- used for text classification (ex. word counts for text classificatin)
a. naive bayes
b. SVM
c. Multinomial
c. Multinomial
Multinomial is a Naive Bayes classifier. Used for multinomial data- used for text classification (ex. word counts for text classificatin)
_____A Naive Bayes classifier used for classification or regression problems. Uses a hyperplane seperation. Discriminative classifier given labeled data. Based on the labeled data it outputs an optimal hyperplane to either input data or categorize potential new points.
a. naive bayes
b. SVM (support vector machines)
c. Multinomial
b. SVM (support vector machines)
SVM - A Naive Bayes classifier used for classification or regression problems. Uses a hyperplane seperation. Discriminative classifier given labeled data. Based on the labeled data it outputs an optimal hyperplane to either input data or categorize potential new points.
Machine Learning is used in :
A. Recommendation engines
B. Self-Driving cars
C. Fraud Detection
D. All the above
D. All the above
Machine Learning is used in :
-Recommendation engines
-Self-Driving cars
-Fraud Detection
Machine learning can be supervised or unsupervised
TRUE
FALSE
TRUE
Machine Learning is divided into two categories : Supervised and Unsupervised
Text Classification is used to correct the grammar mistakes in a text
FALSE
TRUE
FALSE
Text Classification is used to classify text content based on topic or sentiment per example
_____ An Artificial Intelligence computer program that can hold a conversation with a human using natural language
ex. C3PO in Star Wars
A. Chatbots
B. AI
C. Deeplearning
D. NLP
A. Chatbots
An Artificial Intelligence computer program that can hold a conversation with a human using natural language
ex. C3PO in Star Wars
EX: How is the weather going to be tomorrow
___ a library used to build chatbots. ML conversatinoal dialouge engine built using python. Provide automated responses using queries. Easy to use and create a chatbot fastly. Uses ML algorithms to produce responses. It is multi-lingual and open source , available on GitHub
A. Chatbots
B. AI
C. Deeplearning
D. Chatterbot
D. Chatterbot
CHATTERBOT - a library used to build chatbots. ML conversatinoal dialouge engine built using python. Provide automated responses using queries. Easy to use and create a chatbot fastly. Uses ML algorithms to produce responses. It is multi-lingual and open source , available on GitHub
What is the correct Chatterbot Flow?
i. Input
ii. Process and Apply Adapters
iii. Response
a. 1,2,3
b. 3,2,1
c. 2,1,3
a. 1,2,3
i. Input
ii. Process and Apply Adapters
iii. Response
To add items to a Table View, we use :
A. A program that can hold a conversation
B. HumanDroid
C. A dating app
To add items to a Table View, we use:
A. A program that can hold a conversation
A chatbot is a computer progrm that can hold a human like conversation
We use ChatterBot preprocessors to modify the input statement that a chatbot receives
TRUE
FALSE
TRUE
Preprocessors are used to modify the input like ‘chatterbot.preprocessors.clean_whitespace’ that is used to remove white spaces
______ Are used to modify the input like ‘chatterbot.preprocessors.clean_whitespace’ that is used to remove white spaces
A. pre-processors
b. GPU
c. classification
d. NLTK
A. pre-processors
Preprocessors are used to modify the input like ‘chatterbot.preprocessors.clean_whitespace’ that is used to remove white spaces
Chatterbot supports only the English Language
FALSE
TRUE
FALSE
One of chatterbot advantages is that it supports multiple languages
How many words in the sentence?
” I always uh do the main um processing, I mean, the uh um data-processing.”
a. 15
b. 10
c. 11
uh and um are considered “DE-INFLUENCES”
If it is an important word, you have to consider the application you are working on.
a. 15
In this specific case, we are picking space units.
____ this is task dependent and languate dependent
a. word
b. vocabulary
c. correctors
a. word
Words are task dependent and language dependent
_____ set of unique words (word types)
a. word
b. vocabulary
c. correctors
b. vocabulary
Vocabulary - set of unique words (word types). Punctuations are not words.
I always uh do the main um processing, I mean, the uh um data, processing.
In NLP vocabulary what is not considered
a. punctuation
b. stop words
c. can’t
a. punctuation
In NLP, vocabulary, PUNCTUATION is NOT CONSIDERED
What is the vocabulary outcome of the following:
“I always uh do the main um processing, I mean, the uh um data-processing”
{I, always, uh, do, the, main, um, processing, I, mean, the, um, data, processing}
______ large body of text and all available documents are there.
a. corpus
b. vocab
c. stop words
a. corpus
CORPUS- large body of text and all available documents are there.
__________List of words (tokens) in a document. Meaning all the words in a document.
a. corpus
b. vocab
c. stop words
d. tokens
d. tokens
Tokens- List of words (tokens) in a document. Meaning all the words in a document.
What is the token in the following:
“I always uh do the main um processing, I mean, the uh um data-processing?”
a. 11
b. 15
c. 12
b. 15
every word is a token. Every word should be included.
_______ up to date package for NLP processing. Most modern NLP is used in this package.
a. Python
b. spaCy
c. sci-kit learn
b. spaCy
spaCy- up to date package for NLP processing. Most modern NLP is used in this package.
The Text Processing Flow
i. build the vocabulary (corpus, recognize need words)
ii. represent different words by word encodings (also called word embeddings)
iii. classification pipeline
a. 1,2,3
b. 3,2,1
c. 2,1,3
a. 1,2,3
The Text Processing Flow
i. build the vocabulary (corpus, recognize need words)
ii. represent different words by word encodings (also called word embeddings)
iii. classification pipeline
True or False:
Every NLP task requires text normalization:
- Tokenizing words
- Normilizing word formats
- Segmenting sentences
True
Every NLP task requires text normalization:
- Tokenizing words
- Normilizing word formats
- Segmenting sentences
True or False:
In Space based Tokenization,
Many languages (like Chinese, Japanese, Thai) DO NOT use spaces to seperate words.
True
In “Space based Tokenization”,
Many languages (like Chinese, Japanese, Thai) DO NOT use spaces to seperate words.
i. recieve the data
ii. learn from the data, what kind of units should be called as tokens regardless of words or not.
overall: USE THE DATA to tell us HOW TO TOKENIZE
a. Data Driven Approach
b. Data Tokenization
c. Subword Tokenization
a. Data Driven Approach
i. recieve the data
ii. learn from the data, what kind of units should be called as tokens regardless of words or not.
overall: USE THE DATA to tell us HOW TO TOKENIZE
______ Rather than using the whole words you build the vocabulary by individual characters. (all the letters in the corpus). Then add words with the letters who are most frequent.
a. Data Driven Approach
b. Data Tokenization
c. Subword Tokenization
d. Byte Pair Encoding (BPE)
Visual: (A,B,C,D,…a,b,c,d,,)
(A,B,) is merged ‘AB’ to the vocab. Add this to the corpuse
(A,B,C,D, ….a,b,c,d,.. AB)
Keep doing this until you have alot of merges that make words. “k merges’
d. Byte Pair Encoding (BPE)
Rather than using the whole words you build the vocabulary by individual characters. (all the letters in the corpus). Then add words with the letters who are most frequent.
Visual: (A,B,C,D,…a,b,c,d,,)
(A,B,) is merged ‘AB’ to the vocab. Add this to the corpuse
(A,B,C,D, ….a,b,c,d,.. AB)
Keep doing this until you have alot of merges that make words. “k merges’
_______________ smallest meaning-bearing unit of a language
a. morpheme
b. byte pair encoding (BPE)
c. Data Driven Approach
d. Data Tokenization
a. morpheme
morpheme- smallest meaning- bearing unit of a language
_________ the core meaning bearing units
a. stems
b. affixes
c. morpheme
d. word normalization
a. stems
STEMS- the core meaning bearing units
___________ certain parts that are apart of stems, often with grammatical functions.
ex. ING -> LAMDA sign
ex. SSES -> SS
ex. ATIONAL -> ATE (relational -> relate)
a. stems
b. affixes
c. morpheme
d. word normalization
b. affixes
AFFIXES - certain parts that are apart of stems, often with grammatical functions.
ex. ING -> LAMDA sign
ex. SSES -> SS
ex. ATIONAL -> ATE (relational -> relate)
__________ very useful in prepocessing. This is critical in chatbot systems and speech recognition systems.
a. sentence segmentation
b. word normalization
c. stemming
d. morpheme
a. sentence segmentation
SENTENCE SEGMENTATION - very useful in preprocessing. This is critical in chatbot systems and speech recognition systems.
Flow of Sentence Segmentation
i. tokenize first
ii. Use rules or ML to classify a .(.) as either 1(part of word) or 2 (in sentence)
a. 1,2
b. 2,1
a. 1,2
Flow of Sentence Segmentation
i. tokenize first
ii. Use rules or ML to classify a .(.) as either 1(part of word) or 2 (in sentence)
A model that actually predicts the probability of a sentence. Also can be known as Probabilistic Model.
a. Language modeling
b. probabilistic modeling
c. machine translation
d. speech recognition
a. Language modeling
A model that actually predicts the probability of a sentence. Also can be known as Probabilistic Model.
P(high winds tonight) > P(large wind tonight)
this is an ex of
a. speech recognition
b. spelling correction
c. machine translation
This is machine translation.
Means that High is more probable than using large.
P(I saw a van)»P(eyes aw of an)
this is an ex of
a. speech recognition
b. spelling correction
c. machine translation
a. speech recognition
Same outcome just different spelling
p(about fifteen minutes from) >
p(about fifteen minuets from)
this is an ex of
a. speech recognition
b. spelling correction
c. machine translation
b. spelling correction
Because minutes is mispelled in 2nd outcome
p(about fifteen minutes from) >
p(about fifteen minuets from)
P(w1,w2,w3,w4)
is an example of
a. probability of a sentence
b. probability of a next word
a. probability of a sentence
P(w1,w2,w3,w4)
P(wn| w1,w2…wn-1)
is an example of
a. probability of a sentence
b. probability of a next word
b. probability of a next word
P(wn| w1,w2…wn-1)