Building Features from Text Data in Microsoft Azure Flashcards
Which of the following must be downloaded into your Natural Language Toolkit (NLTK) environment to use the NLTK defined stopwords?
stem
stopwords
punkt
lemma
stopwords
Which class provides a powerful (but simple) means of analyzing frequency distributions in words?
PunktSentenceTokenizer
nltk. probability.FreqDist
pandas. DataFrame
numpy. array
nltk.probability.FreqDist
What can remove words based on their count of recurrence in a document or corpus?
Lemmatization
Frequency filtering
Stopword removal
Tokenization
Frequency filtering
To use the WordNetLemmatizer class, make sure that you have downloaded which Natural Language Toolkit (NLTK) component?
punkt
tagsets
wordnet
stopwords
wordnet
Which of the following represents the complete set of words represented in an encoding?
Vocabulary
Document
Feature
Corpus
Vocabulary
HashingVectorizer builds on FeatureHasher by providing which capability?
Tokenization of documents
Word embeddings
Parts-of-speech tagging
Locality-sensitive hashing
Tokenization of documents
What is the process of breaking or splitting text into smaller meaningful components?
Stemming
Tokenization
Stopword removal
Lemmatization
Tokenization
Which tokenizer in Natural Language Toolkit (NLTK) can convert text into a sequence of sentences?
RegexTokenizer
PunktSentenceTokenizer
TreebankWordTokenizer
WhitespaceTokenizer
PunktSentenceTokenizer
Which of the following is a process of attempting to reduce a word to its base by removing inflection, but which might result in a nonsense word?
Stemming
Lemmatization
Stopword removal
Tokenization
Stemming
To use sent_tokenize or PunktSentenceTokenizer, which of the following must be downloaded into your Natural Language Toolkit (NLTK) environment?
stem
lemma
stopwords
punkt
punkt