Final Exam Flashcards
True or False, Topic Modeling is an unsupervised learning technique
True
True or False, Singular Value Decomposition (SVD) aims to address skewed frequency of terms
False
True or False, If a model performs indistinguishably from a random value, its AUC will be closer to zero
False
True or False, Mutual information ensures better predictive performance when it is available than no weight option in SAS Enterprise Miner
False
What is the incorrect answer about weightings in text filtering?
a) Term weights are consistent across documents
b) Inverse document frequency depends on the distribution of terms across documents
c) Log transformation for local weights reduces the impact of term frequency more than binary and linear options
d)Mutual information requires a categorical target variable
Inverse document frequency depends on the distribution of terms across documents
Zipf’s law can be interpreted as follows: “The product of the frequency of words (f) and their rank is approximately constant.” Let a be the product of the frequency and rank. What is the incorrect answer?
a) In(f) = In(a) - In(r)
b) The frequency of the terms exponentially decreases with rank
c) Hypothetically, the second most prevalent word appears twice as frequently as the fourth frequent word.
d) Topmost frequent words are likely to be good discriminators
Topmost words are likely to be good discriminators
Quiz 1, Question 9, see slide
Correct!
Write a text filter in SAS Enterprise Miner to return all documents having the term “White House” and not including “Canada.”
“White House” -Canada
Provide a possible situation where you might prefer interpretability over predictive power?
In situations where you are presenting to executives or a operational or business audience.
Quiz 1, Question 12, see slide on Lecture 5
Correct!
True or False - When you’re interested in a small set of terms in text mining, specifying a stop list will be more effective than specifying a start list
False
True or False, The skip-gram model aims to predict context words using a target word
True
True or False, In a long short-term memory (LSTM) model, you determine how much information from previous hidden states and the current state information should be retained through a forget gate
True
True or False, The Bidirectional Encoder Representations from Transformers (BERT) model has both an encoder and a decoder
False
True or False, In training machine learning algorithms, you can overcome high bias by collecting a large number of data points
False
Explainable machine learning indicates that your model can be understood by a human without further technical support
False
This is a set of co-occurrence probabilities for target words “ice” and “steam” in the GloVe model. According to the results, the term “gas” is more effective in distinguishing between “ice” and “steam” than the term “water”
Choose the incorrect answer about embedding models:
A. Bag of words can be considered a special case of the n-gram model
B. A bigram model considers one previous word to predict a word’s probability
C. GloVe learns embedding from global context via a word-word co-occurrence matrix
D. TF-IDF can handle unseen words by leveraging the context in which they appear in the document corpus
D.
Choose the incorrect answer about the structure of deep learning models.
A. Bias plays a similar role to that of y-intercept in the linear equation
B. Weights are used to introduce non-linearity in the model
C. Inputs can be viewed as features or attributes in a data set
D. The work of the summation is to bind the weights and inputs together and calculate their sum
B.
Choose the incorrect answer about the structure of deep learning models
A. Bias plays a similar role to that of y-intercept in the linear equation
B. Weights are used to introduce non-linearity in the model
C. Inputs can be viewed as features or attributes in a dataset
D. The work of the summation function is to bind the weights and inputs and calculate their sum
B.
Choose the incorrect answer about Recurrent Neural Networks (RNNs) and their extensions.
A. RNN retains a memory, which is a distinct characteristic from a basic neural network
B. RNNs are more suitable for handling spatial data rather than sequential data
C. The long short-term memory model is intended to overcome the vanishing gradient problem of RNNs
D. Unlike RNNs, the transformer can effectively deal with dependencies between terms with long distances in the input sequence due to the attention mechanism.
B.
Choose the incorrect answer about the Transformer model and its extensions.
A. An encoder is a to process the input while a decoder generates the output
B. Parallelization enables the transformer to save training times more compared to recurrent neural networks.
C. Unlike the original transformer, BERT introduced positional encodings to maintain word order information.
D. Pre-training and fine-tuning enable data scientists with limited computing resources to build a high-performing model
C.
Choose the incorrect answer about fairness issues in machine learning
A. Machine learning algorithms can generate discriminatory outcomes even without the developer’s intention.
B. The high accuracy of a model guarantees fair algorithms to users.
C. Algorithms can learn bias embedded in their training dataset
D. Even if a predictive model for advertising didn’t learn human bias, its advertising outcomes can be biased due to market mechanisms
B.
Choose the incorrect answer about interpretable and explainable machine learning
A. Higher performing models tend to be less interpretable
B. The goal of permutation importance technique is to understand how the model works.
C. Linear regression is considered one of the interpretable machine learning models
D. The objective of shuffling values in permutation importance is to remove the contribution of the independent variable
B.