Experimental Design for Data Analysis Flashcards

1
Q

Which of the following are valid hyperparameters for decision trees?

A: Depth of a tree

B: Min samples per node

C: Max samples per node

D: Length of a tree

B and C only

A and B only

C and D only

A and C only

A

A and B only

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What kind of machine learning model can predict whether email is spam or ham?

Dimensionality reduction

Classification

Regression

Clustering

A

Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a benefit of Azure Machine Learning Studio?

It allows for the building and training of machine learning models with no code.

It includes pre-trained machine learning models for all to use.

It includes powerful APIs for common machine learning problems.

A

It allows for the building and training of machine learning models with no code.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

If you want to prototype your machine learning models in Python on Azure, what service would you choose?

Azure Machine Learning Studio

Azure APIs

Azure Notebooks

Azure Machine Learning Service

A

Azure Notebooks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How is a confusion matrix constructed?

First row = accuracy, recall; Second row = precision, F1-score

Rows = predicted labels; columns = actual labels; cell values = harmonic mean of instance counts for corresponding pair of actual and predicted labels

Rows = actual labels; columns = predicted labels; cell values = instance counts for corresponding pair of actual and predicted labels

A

Rows = actual labels; columns = predicted labels; cell values = instance counts for corresponding pair of actual and predicted labels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is the accuracy of a classifier calculated?

TP/(TP + FN) where TP = number of true positives and FN = number of false negatives

Sum of all diagonal elements from confusion matrix; divide by sum of all elements in confusion matrix

TP/(TP + FP) where TP = number of true positives and FP = number of false positives

Average of all diagonal elements from confusion matrix; divide by average of all elements in confusion matrix

A

Sum of all diagonal elements from confusion matrix; divide by sum of all elements in confusion matrix

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the best definition of hyperparameters in a machine learning model?

Model configuration parameters that learn from test data

Model inputs that train models

Model parameters that learn from training data

Design parameters of the machine learning algorithm that stay constant during training

A

Design parameters of the machine learning algorithm that stay constant during training

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

If you want to ensure that grouped data does not cross validation-fold boundaries, what kind of cross validation would you choose?

Singular cross validation

Stratified k-fold

Repeated k-fold

Group k-fold

K-fold

A

Group k-fold

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

If you want to ensure that each fold of your validation data has similar representations of records of each category or class, what kind of cross validation would you choose?

Repeated k-fold

K-fold

Singular cross validation

Group k-fold

Stratified k-fold

A

Stratified k-fold

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which test looks across multiple samples, compares their means, and computes one test statistic and one p-value?

Paired difference test

Analysis of variance (ANOVA)

T-test

Linear regression

A

Analysis of variance (ANOVA)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly