Introduction to Machine Learning Flashcards

1
Q

If the business case is to predict fraud detection, which is the correct Objective to choose in Vertex AI?

Segmentation

Forecasting

Regression/Classification

Clustering

A

Regression/Classification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
1
Q

Which of the following metrics can be used to find a suitable balance between precision and recall in a model?

F1 Score

ROC AUC

Log Loss

PR AUC

A

PR AUC, this is the area under the precision-recall PR curve.
F1 score, this is the harmonic mean of precision and recall. F1 is a useful metric if you’re looking for a balance between precision and recall and there’s an uneven class distribution.
ROC AUC, this is the area under the receiver operating characteristic ROC curve.
Log loss, this is the cross-entropy between the model predictions and the target values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

MAE, MAPE, RMSE, RMSLE and R2 are all available as test examples in the Evaluate section of Vertex AI and are common examples of what type of metric?

Decision Trees Progression Metrics

Linear Regression Metrics

Clustering Regression Metrics

Forecasting Regression Metrics

A

Linear Regression Metrics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

If a dataset is presented in a Comma Separated Values (CSV) file, which is the correct data type to choose in Vertex AI?

Video

Tabular

Text

Image

A

Tabular

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

For a user who can use SQL, has little Machine Learning experience and wants a ‘Low-Code’ solution, which Machine Learning framework should they use?

Python

BigQuery ML

AutoML

Scikit-Learn

A

BigQuery ML

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the default setting in AutoML Tables for the data split in model evaluation?

80% Training 10% Validation, 10% Testing

80% Training, 15% Validation, 5% Testing

80% Training, 5% Validation, 15% Testing

70% Training, 20% Validation, 10% Testing

A

80% Training 10% Validation, 10% Testing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does the Feature Importance attribution in Vertex AI display?

How much each feature impacts the model, expressed as a decimal

How much each feature impacts the model, expressed as a ratio

How much each feature impacts the model, expressed as a ranked list

How much each feature impacts the model, expressed as a percentage

A

How much each feature impacts the model, expressed as a percentage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which of the following are stages of the Machine Learning workflow that can be managed with Vertex AI?

All of the options.

Train an ML model on your data.

Deploy your trained model to an endpoint for serving predictions.

Create a dataset and upload data.

A

All of the options.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the main benefit of using an automated Machine Learning workflow?

It reduces the time it takes to develop trained models and assess their performance.

It makes the model run faster.

It deploys the model into production.

It makes the model perform better.

A

It reduces the time it takes to develop trained models and assess their performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which of the following are advantages of BigQuery ML when compared to Python based ML frameworks?

All of the options

BigQuery ML automates multiple steps in the ML workflow

BigQuery ML custom models can be created without the use of multiple tools

Moving and formatting large amounts of data takes longer with Python based models compared to model training in BigQuery

A

All of the options

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which of these BigQuery supported classification models is most relevant for predicting binary results, such as True/False?

DNN Classifier (TensorFlow)

AutoML Tables

XGBoost

Logistic Regression

A

Logistic Regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Where labels are not available, for example where customer segmentation is required, which of the following BigQuery supported models is useful?

Time Series Anomaly Detection

Recommendation - Matrix Factorization

Time Series Forecasting

K-Means Clustering

A

K-Means Clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

For Classification or Regression problems with decision trees, which of the following models is most relevant?

XGBoost

AutoML Tables

Wide and Deep NNs

Linear Regression

A

XGBoost

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the 3 key steps for creating a Recommendation System with BigQuery ML?

Prepare training data in BigQuery, specify the model options in BigQuery ML, export the predictions to Google Analytics

Import training data to BigQuery, train a recommendation system with BigQuery ML, tune the hyperparameters

Prepare training data in BigQuery, train a recommendation system with BigQuery ML, use the predicted recommendations in production

Prepare training data in BigQuery, select a recommendation system from BigQuery ML, deploy and test the model

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which of the following loss functions is used for classification problems?

MSE

Cross entropy

None of the options are correct.

Both MSE & Cross entropy

A

Cross entropy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which of the following gradient descent methods is used to compute the entire dataset?

Batch gradient descent

Gradient descent

None of the options are correct.

Mini-batch gradient descent

A

Batch gradient descent

16
Q

What are the basic steps in an ML workflow (or process)?

Collect data

Check for anomalies, missing data and clean the data

All options are correct.

Perform statistical analysis and initial visualization

A

All options are correct

17
Q

For the formula used to model the relationship i.e. y = mx + b, what does ‘m’ stand for?

It captures the amount of change we’ve observed in our label in response to a small change in our feature.

It refers to a bias term which can be used for regression.

None of the options are correct.

It refers to a bias term which can be used for regression and it captures the amount of change we’ve observed in our label in response to a small change in our feature.

A

It captures the amount of change we’ve observed in our label in response to a small change in our feature.

18
Q

Which of the following are benefits of Performance metrics over loss functions?

Performance metrics are easier to understand.

Performance metrics are directly connected to business goals.

None of the options are correct.

Performance metrics are easier to understand and are directly connected to business goals.

A

Performance metrics are easier to understand and are directly connected to business goals.

19
Q

Which of the following allows you to create repeatable samples of your data?

Use the first few digits of a hash function on the field that you’re using to split or bucketize your data.

Use the last few digits of a hash function on the field that you’re using to split or bucketize your data.

Use the first few digits or the last few digits of a hash function on the field that you’re using to split or bucketize your data.

None of the options are correct.

A

Use the last few digits of a hash function on the field that you’re using to split or bucketize your data.

20
Q

How do you decide when to stop training a model?

When your loss metrics start to decrease
check
When your loss metrics start to increase

When your loss metrics start to both increase and decrease

None of the options are correct

A

When your loss metrics start to increase

21
Q

Which of the following allows you to split the dataset based upon a field in your data?

BUCKETIZE, an open-source hashing algorithm that is implemented in BigQuery SQL.
check
FARM_FINGERPRINT, an open-source hashing algorithm that is implemented in BigQuery SQL.

ML_FEATURE FINGERPRINT, an open-source hashing algorithm that is implemented in BigQuery SQL.

None of the options are correct.

A

FARM_FINGERPRINT, an open-source hashing algorithm that is implemented in BigQuery SQL.

22
Q

Which is the best way to assess the quality of a model?

Observing how well a model performs against a new dataset that it hasn’t seen before and observing how well a model performs against an existing known dataset.

Observing how well a model performs against an existing known dataset.
check

Observing how well a model performs against a new dataset that it hasn’t seen before.

None of the options are correct

A

Observing how well a model performs against a new dataset that it hasn’t seen before.

23
Q

Which of the following actions can you perform on your model when it is trained and validated?

You can write it once, and only once, against the independent test dataset.

You can write it multiple times against the dependent test dataset.

You can write it once, and only once against the dependent test dataset.

You can write it multiple times against the independent test dataset.

A

You can write it once, and only once, against the independent test dataset.

24
Q

Which of the following are categories of data quality tools?

Both ‘Cleaning tools’ and ‘Monitoring tools’

Monitoring tools

Cleaning tools

None of the options

A

Both ‘Cleaning tools’ and ‘Monitoring tools’

25
Q

What are the features of low data quality?

Duplicated data

Incomplete data

Unreliable info

All of the options

A

All of the options

26
Q

What are the objectives of exploratory data analysis?

Uncover a parsimonious model, one which explains the data with a minimum number of predictor variables.

Gain maximum insight into the data set and its underlying structure.

Check for missing data and other mistakes.

All of the options

A

All of the options

27
Q

Exploratory Data Analysis is majorly performed using the following methods:

Both Univariate and Bivariate

Bivariate

Univariate

None of the options

A

Both Univariate and Bivariate

28
Q

Which of the following is not a component of Exploratory Data Analysis?

Statistical Analysis and Clustering

Anomaly Detection

Accounting and Summarizing

Hyperparameter tuning

A

Hyperparameter tuning

29
Q

Why is regularization important in logistic regression?

Finds errors in the algorithm

Keeps training time down by regulating the time allowed

Avoids overfitting

Encourages the use of large weights

A

Avoids overfitting

30
Q

Which of the following machine learning models have labels, or in other words, the correct answers to whatever it is that we want to learn to predict?

Reinforcement Model

Supervised Model

Unsupervised Model

None of the options

A

Supervised Model

31
Q

Which model would you use if your problem required a discrete number of values or classes?

Supervised Model

Unsupervised Model

Regression Model

Classification Model

A

Classification Model

32
Q

To predict the continuous value of our label, which of the following algorithms is used?

Unsupervised

Regression

Classification

None of the options

A

Regression

33
Q

What is the most essential metric a regression model uses?

Both ‘Mean squared error as their loss function’ & ‘Cross entropy’

Cross entropy

Mean squared error as their loss function

None of the options

A

Mean squared error as their loss function