Experimental Design for Data Analysis Flashcards
Which of the following are valid hyperparameters for decision trees?
A: Depth of a tree
B: Min samples per node
C: Max samples per node
D: Length of a tree
B and C only
A and B only
C and D only
A and C only
A and B only
What kind of machine learning model can predict whether email is spam or ham?
Dimensionality reduction
Classification
Regression
Clustering
Classification
What is a benefit of Azure Machine Learning Studio?
It allows for the building and training of machine learning models with no code.
It includes pre-trained machine learning models for all to use.
It includes powerful APIs for common machine learning problems.
It allows for the building and training of machine learning models with no code.
If you want to prototype your machine learning models in Python on Azure, what service would you choose?
Azure Machine Learning Studio
Azure APIs
Azure Notebooks
Azure Machine Learning Service
Azure Notebooks
How is a confusion matrix constructed?
First row = accuracy, recall; Second row = precision, F1-score
Rows = predicted labels; columns = actual labels; cell values = harmonic mean of instance counts for corresponding pair of actual and predicted labels
Rows = actual labels; columns = predicted labels; cell values = instance counts for corresponding pair of actual and predicted labels
Rows = actual labels; columns = predicted labels; cell values = instance counts for corresponding pair of actual and predicted labels
How is the accuracy of a classifier calculated?
TP/(TP + FN) where TP = number of true positives and FN = number of false negatives
Sum of all diagonal elements from confusion matrix; divide by sum of all elements in confusion matrix
TP/(TP + FP) where TP = number of true positives and FP = number of false positives
Average of all diagonal elements from confusion matrix; divide by average of all elements in confusion matrix
Sum of all diagonal elements from confusion matrix; divide by sum of all elements in confusion matrix
What is the best definition of hyperparameters in a machine learning model?
Model configuration parameters that learn from test data
Model inputs that train models
Model parameters that learn from training data
Design parameters of the machine learning algorithm that stay constant during training
Design parameters of the machine learning algorithm that stay constant during training
If you want to ensure that grouped data does not cross validation-fold boundaries, what kind of cross validation would you choose?
Singular cross validation
Stratified k-fold
Repeated k-fold
Group k-fold
K-fold
Group k-fold
If you want to ensure that each fold of your validation data has similar representations of records of each category or class, what kind of cross validation would you choose?
Repeated k-fold
K-fold
Singular cross validation
Group k-fold
Stratified k-fold
Stratified k-fold
Which test looks across multiple samples, compares their means, and computes one test statistic and one p-value?
Paired difference test
Analysis of variance (ANOVA)
T-test
Linear regression
Analysis of variance (ANOVA)