Preparing Data for Feature Engineering and Machine Learning in Microsoft Azure Flashcards

Question 1

Q

What issue you could possibly face with a credit card fraud detection dataset?

Problem of outliers

Problem of high-dimensionality

Problem of imbalanced data

Multicollinearity problem

Answer

A

Problem of imbalanced data

Question 2

Q

What happens when we increase the amount of data for a machine learning problem?

A. The training accuracy increases, test accuracy decreases

B. The training accuracy increases, test accuracy increases

C. The training accuracy decreases, test accuracy decreases

D. The training accuracy decreases, test accuracy increases

Answer

A

D. The training accuracy decreases, test accuracy increases

Question 3

Q

You can delete the records with missing values if the missing assumption is what?

Missing at Random

Missing Completely at Random

Either of MCAR or MAR

Missing not at Random

Answer

A

Missing Completely at Random

Question 4

Q

Which is the best method to use to handle missing data if the feature has outliers?

Mode imputation

Mean imputation

Listwise deletion

Median imputation

Answer

A

Median imputation

Question 5

Q

Which of the following Machine Learning models does not have any target value?

Clustering

Anomaly detection

Regression

Classification

Answer

A

Clustering

Question 6

Q

Which of the following machine learning models’ target is a continuous value?

Regression

Classification

Anomaly detection

Clustering

Answer

A

Regression

Question 7

Q

Which of the following is the BEST way to create features for a high-cardinality categorical data?

One-hot encoding

Learning with counts

Dummy coding

Binning

Answer

A

Learning with counts

Question 8

Q

Which of the following is a disadvantage of linear models?

They run slower

They are not scalable

They may not give accurate predictions

They are harder to train

Answer

A

They may not give accurate predictions

Question 9

Q

What is TRUE about Leave-one-out cross validation?

It produces low bias and high variance models

It produces low bias and low variance models

It produces high bias and low variance models

It produces high bias and high variance models

Answer

A

It produces low bias and high variance models

Question 10

Q

Suppose you need to create 7 folds for K-fold Cross validation. How would you do it?

Use Partition and Sample module with ‘Assign to folds’ mode

Use Partition and Sample module with ‘Pick folds’ mode

Use Split data module to assign folds

Use the Cross-validate model module

Answer

A

Use Partition and Sample module with ‘Assign to folds’ mode

Preparing Data for Feature Engineering and Machine Learning in Microsoft Azure Flashcards

(10 cards)