Final Exam Flashcards
True or False, Topic Modeling is an unsupervised learning technique
True
True or False, Singular Value Decomposition (SVD) aims to address skewed frequency of terms
False
True or False, If a model performs indistinguishably from a random value, its AUC will be closer to zero
False
True or False, Mutual information ensures better predictive performance when it is available than no weight option in SAS Enterprise Miner
False
What is the incorrect answer about weightings in text filtering?
a) Term weights are consistent across documents
b) Inverse document frequency depends on the distribution of terms across documents
c) Log transformation for local weights reduces the impact of term frequency more than binary and linear options
d)Mutual information requires a categorical target variable
Inverse document frequency depends on the distribution of terms across documents
Zipf’s law can be interpreted as follows: “The product of the frequency of words (f) and their rank is approximately constant.” Let a be the product of the frequency and rank. What is the incorrect answer?
a) In(f) = In(a) - In(r)
b) The frequency of the terms exponentially decreases with rank
c) Hypothetically, the second most prevalent word appears twice as frequently as the fourth frequent word.
d) Topmost frequent words are likely to be good discriminators
Topmost words are likely to be good discriminators
Quiz 1, Question 9, see slide
Correct!
Write a text filter in SAS Enterprise Miner to return all documents having the term “White House” and not including “Canada.”
“White House” -Canada
Provide a possible situation where you might prefer interpretability over predictive power?
In situations where you are presenting to executives or a operational or business audience.
Quiz 1, Question 12, see slide on Lecture 5
Correct!
True or False - When you’re interested in a small set of terms in text mining, specifying a stop list will be more effective than specifying a start list
False
True or False, The skip-gram model aims to predict context words using a target word
True
True or False, In a long short-term memory (LSTM) model, you determine how much information from previous hidden states and the current state information should be retained through a forget gate
True
True or False, The Bidirectional Encoder Representations from Transformers (BERT) model has both an encoder and a decoder
False
True or False, In training machine learning algorithms, you can overcome high bias by collecting a large number of data points
False