Production Machine Learning Systems Flashcards

Question 1

Q

Which type of training do you use if your data set doesn’t change over time?

Online training

Dynamic training

Real-time training

Static training

Answer

A

Static training

Question 2

Q

In the featurestore, the timestamps are an attribute of the feature values, not a separate resource type.

False

True

Question 3

Q

When you use the data to train a model, Vertex AI examines the source data type and feature values and infers how it will use that feature in model training. This is called the ________________for that feature.

Transformation

Transmutation

Translation

Duplication

Answer

A

Transformation

Question 4

Q

Match the three types of data ingest with an appropriate source of training data.

Streaming (BigQuery), structured batch (Pub/Sub), unstructured batch (Cloud Storage)

Streaming batch (Dataflow), structured batch (BigQuery), stochastic (App Engine)

Streaming (Pub/Sub), structured batch (BigQuery), unstructured batch (Cloud Storage)

Answer

A

Streaming (Pub/Sub), structured batch (BigQuery), unstructured batch (Cloud Storage)

Question 5

Q

What is the responsibility of model evaluation and validation components?

To ensure that the models are not good after moving them into a staging environment.

To ensure that the models are not good before moving them into a staging environment.

To ensure that the models are good after moving them into a production/staging environment.

To ensure that the models are good before moving them into a production/staging environment.

Answer

A

To ensure that the models are good before moving them into a production/staging environment.

Question 6

Q

What percent of system code does the ML model account for?

90%

25%

50%

5%

Question 7

Q

Vertex AI has a unified data preparation tool that supports image, tabular, text, and video content. Where are uploaded datasets stored in Vertex AI?

A Google Cloud Storage bucket that acts as an output for both AutoML, custom training jobs, serialized training jobs.

A Google Cloud database that acts as an input for both AutoML and custom training jobs.

A Google Cloud Storage bucket that acts as an input for both AutoML and custom training jobs.

A Google Cloud database that acts as an output for both AutoML and custom training jobs.

Answer

A

A Google Cloud Storage bucket that acts as an input for both AutoML and custom training jobs.

Question 8

Q

Which type of logging should be enabled in the online prediction that logs the stderr and stdout streams from your prediction nodes to Cloud Logging and can be useful for debugging?

Cloud logging

Container logging

Access logging

Request-response logging

Answer

A

Container logging

Question 9

Q

Suppose you are building an ML-based system to predict the likelihood that a customer will leave a positive review. The user interface that customers leave reviews on changed a few months ago, but you don’t know about this. Which of these is a potential consequence of mismanaging this data dependency?

Change in ability of model to be part of a streaming ingest

Losses in prediction quality

Change in model serving signature

Answer

A

Losses in prediction quality

Question 10

Q

Which of the following models are susceptible to a feedback loop? Check all that apply.

A book-recommendation model that suggests novels its users may like based on their popularity (i.e., the number of times the books have been purchased).
Correct! Book recommendations are likely to drive purchases, and these additional sales will be fed back into the model as input, making it more likely to recommend these same books in the future.

A traffic-forecasting model that predicts congestion at highway exits near the beach, using beach crowd size as one of its features.

A housing-value model that predicts house prices, using size (area in square meters), number of bedrooms, and geographic location as features.

A face-attributes model that detects whether a person is smiling in a photo, which is regularly trained on a database of stock photography that is automatically updated monthly.

An election-results model that forecasts the winner of a mayoral race by surveying 2% of voters after the polls have closed.

A university-ranking model that rates schools in part by their selectivity (the percentage of students who applied that were admitted).

Answer

A

A book-recommendation model that suggests novels its users may like based on their popularity (i.e., the number of times the books have been purchased).

Book recommendations are likely to drive purchases, and these additional sales will be fed back into the model as input, making it more likely to recommend these same books in the future.

A traffic-forecasting model that predicts congestion at highway exits near the beach, using beach crowd size as one of its features.

Some beachgoers are likely to base their plans on the traffic forecast. If there is a large beach crowd and traffic is forecast to be heavy, many people may make alternative plans. This may depress beach turnout, resulting in a lighter traffic forecast, which then may increase attendance, and the cycle repeats.

-A university-ranking model that rates schools in part by their selectivity (the percentage of students who applied that were admitted).

Question 11

Q

What is the shift in the actual relationship between the model inputs and the output called?

Prediction drift

Data drift

Label drift

Concept drift

Answer

A

Concept drift

Question 12

Q

Gradual drift is used for which of the following?

An old concept that incrementally changes to a new concept over a period of time

A new concept that occurs within a short time

An old concept that may reoccur after some time

A new concept that rapidly replaces an old one over a short period of time

Answer

A

An old concept that incrementally changes to a new concept over a period of time

Question 13

Q

Which component identifies anomalies in training and serving data and can automatically create a schema by examining the data?

Data validation

Data ingestion

Data identifier

Data transform

Answer

A

Data validation

Question 14

Q

What is training skew caused by?

The Cloud Storage you load your data from in the training environment is physically closer than the Cloud Storage you load your data from in the production environment.

Your development and production environments are different, or different code is used in the training environment than in the development environment.

Starting and stopping of the processing when training the model.

The prediction environment is slower than the training environment.

Answer

A

Your development and production environments are different, or different code is used in the training environment than in the development environment.

Question 15

Q

Which of the following tools help software users manage dependency issues?

Monolithic programs

Modular programs

Polylithic programs

Maven, Gradle, and Pip

Answer

A

Maven, Gradle, and Pip

Question 16

Q

If each of your examples is large in terms of size and requires parsing, and your model is relatively simple and shallow, your model is likely to be:
check

I/O bound, so you should look for ways to store data more efficiently and ways to parallelize the reads.

CPU-bound, so you should use GPUs or TPUs.

Latency-bound, so you should use faster hardware

Answer

A

I/O bound, so you should look for ways to store data more efficiently and ways to parallelize the reads.

Question 17

Q

What does high-performance machine learning determine?

Time taken to train a model

Reliability of a model

Deploying a model

Training a model

Answer

A

Time taken to train a model

Question 18

Q

For the fastest I/O performance in TensorFlow… (check all that apply)

Read in parallel threads.

Optimize TensorFlow performance using the Profiler.

Read TF records into your model.

Prefetch the data

Answer

A

All of them:

Read in parallel threads.

Optimize TensorFlow performance using the Profiler.

Read TF records into your model.

Prefetch the data

Question 19

Q

Which of the following indicates that ML training is CPU bound?

If I/O is complex, but the model involves lots of complex/expensive computations.

If you are running a model on powered hardware.

If I/O is simple, but the model involves lots of complex/expensive computations.

If you are running a model on accelerated hardware.

Answer

A

If I/O is simple, but the model involves lots of complex/expensive computations.

Question 20

Q

Which of the following determines the correct property of Tensorflow Lite?

Quantization

Higher precision arithmetic

Increased code footprint

Lower precision arithmetic

Answer

A

Quantization

Question 21

Q

To copy the input data into TensorFlow, which of the following syntaxes is correct?

inferenceInterface.feed(inputName, floatValues, 1, inputSize; inputSize, 3);

inferenceInterface.feed(inputName, floatValues, 1, inputSize, inputSize, 3);

inferenceInterface.feed(floatValues, 1, inputSize, inputSize, 3);

inferenceInterface.feed(inputName, floatValues, 1, inputSize, 3);

Answer

A

inferenceInterface.feed(inputName, floatValues, 1, inputSize, inputSize, 3);

Question 22

Q

Which of these are reasons that you may not be able to perform machine learning solely on Google Cloud? Check all that apply.

You need to run inference on the edge.

TensorFlow is not supported on Google Cloud.

You are tied to on-premises or multi-cloud infrastructure due to business reasons.

Answer

A

You need to run inference on the edge.

You are tied to on-premises or multi-cloud infrastructure due to business reasons.

Question 23

Q

A key principle behind Kubeflow is portability so that you can:

Migrate your model from TensorFlow to PyTorch.

Convert your model from CUDA to XLA.

Move your model from on-premises to Google Cloud.

Answer

A

Move your model from on-premises to Google Cloud.