Building Features from Text Data in Microsoft Azure Flashcards

Question 1

Q

Which of the following must be downloaded into your Natural Language Toolkit (NLTK) environment to use the NLTK defined stopwords?

stem

stopwords

punkt

lemma

Answer

A

stopwords

Question 2

Q

Which class provides a powerful (but simple) means of analyzing frequency distributions in words?

PunktSentenceTokenizer

nltk. probability.FreqDist
pandas. DataFrame
numpy. array

Answer

A

nltk.probability.FreqDist

Question 3

Q

What can remove words based on their count of recurrence in a document or corpus?

Lemmatization

Frequency filtering

Stopword removal

Tokenization

Answer

A

Frequency filtering

Question 4

Q

To use the WordNetLemmatizer class, make sure that you have downloaded which Natural Language Toolkit (NLTK) component?

punkt

tagsets

wordnet

stopwords

Question 5

Q

Which of the following represents the complete set of words represented in an encoding?

Vocabulary

Document

Feature

Corpus

Answer

A

Vocabulary

Question 6

Q

HashingVectorizer builds on FeatureHasher by providing which capability?

Tokenization of documents

Word embeddings

Parts-of-speech tagging

Locality-sensitive hashing

Answer

A

Tokenization of documents

Question 7

Q

What is the process of breaking or splitting text into smaller meaningful components?

Stemming

Tokenization

Stopword removal

Lemmatization

Answer

A

Tokenization

Question 8

Q

Which tokenizer in Natural Language Toolkit (NLTK) can convert text into a sequence of sentences?

RegexTokenizer

PunktSentenceTokenizer

TreebankWordTokenizer

WhitespaceTokenizer

Answer

A

PunktSentenceTokenizer

Question 9

Q

Which of the following is a process of attempting to reduce a word to its base by removing inflection, but which might result in a nonsense word?

Stemming

Lemmatization

Stopword removal

Tokenization

Question 10

Q

To use sent_tokenize or PunktSentenceTokenizer, which of the following must be downloaded into your Natural Language Toolkit (NLTK) environment?

stem

lemma

stopwords

punkt