Production Machine Learning Systems Flashcards

1
Q

Which type of training do you use if your data set doesn’t change over time?

Online training

Dynamic training

Real-time training

Static training

A

Static training

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

In the featurestore, the timestamps are an attribute of the feature values, not a separate resource type.

False

True

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When you use the data to train a model, Vertex AI examines the source data type and feature values and infers how it will use that feature in model training. This is called the ________________for that feature.

Transformation

Transmutation

Translation

Duplication

A

Transformation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Match the three types of data ingest with an appropriate source of training data.

Streaming (BigQuery), structured batch (Pub/Sub), unstructured batch (Cloud Storage)

Streaming batch (Dataflow), structured batch (BigQuery), stochastic (App Engine)

Streaming (Pub/Sub), structured batch (BigQuery), unstructured batch (Cloud Storage)

A

Streaming (Pub/Sub), structured batch (BigQuery), unstructured batch (Cloud Storage)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the responsibility of model evaluation and validation components?

To ensure that the models are not good after moving them into a staging environment.

To ensure that the models are not good before moving them into a staging environment.

To ensure that the models are good after moving them into a production/staging environment.

To ensure that the models are good before moving them into a production/staging environment.

A

To ensure that the models are good before moving them into a production/staging environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What percent of system code does the ML model account for?

90%

25%

50%

5%

A

5%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Vertex AI has a unified data preparation tool that supports image, tabular, text, and video content. Where are uploaded datasets stored in Vertex AI?

A Google Cloud Storage bucket that acts as an output for both AutoML, custom training jobs, serialized training jobs.

A Google Cloud database that acts as an input for both AutoML and custom training jobs.

A Google Cloud Storage bucket that acts as an input for both AutoML and custom training jobs.

A Google Cloud database that acts as an output for both AutoML and custom training jobs.

A

A Google Cloud Storage bucket that acts as an input for both AutoML and custom training jobs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which type of logging should be enabled in the online prediction that logs the stderr and stdout streams from your prediction nodes to Cloud Logging and can be useful for debugging?

Cloud logging

Container logging

Access logging

Request-response logging

A

Container logging

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Suppose you are building an ML-based system to predict the likelihood that a customer will leave a positive review. The user interface that customers leave reviews on changed a few months ago, but you don’t know about this. Which of these is a potential consequence of mismanaging this data dependency?

Change in ability of model to be part of a streaming ingest

Losses in prediction quality

Change in model serving signature

A

Losses in prediction quality

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which of the following models are susceptible to a feedback loop? Check all that apply.

A book-recommendation model that suggests novels its users may like based on their popularity (i.e., the number of times the books have been purchased).
Correct! Book recommendations are likely to drive purchases, and these additional sales will be fed back into the model as input, making it more likely to recommend these same books in the future.

A traffic-forecasting model that predicts congestion at highway exits near the beach, using beach crowd size as one of its features.

A housing-value model that predicts house prices, using size (area in square meters), number of bedrooms, and geographic location as features.

A face-attributes model that detects whether a person is smiling in a photo, which is regularly trained on a database of stock photography that is automatically updated monthly.

An election-results model that forecasts the winner of a mayoral race by surveying 2% of voters after the polls have closed.

A university-ranking model that rates schools in part by their selectivity (the percentage of students who applied that were admitted).

A
  • A book-recommendation model that suggests novels its users may like based on their popularity (i.e., the number of times the books have been purchased).

Book recommendations are likely to drive purchases, and these additional sales will be fed back into the model as input, making it more likely to recommend these same books in the future.

  • A traffic-forecasting model that predicts congestion at highway exits near the beach, using beach crowd size as one of its features.

Some beachgoers are likely to base their plans on the traffic forecast. If there is a large beach crowd and traffic is forecast to be heavy, many people may make alternative plans. This may depress beach turnout, resulting in a lighter traffic forecast, which then may increase attendance, and the cycle repeats.

-A university-ranking model that rates schools in part by their selectivity (the percentage of students who applied that were admitted).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the shift in the actual relationship between the model inputs and the output called?

Prediction drift

Data drift

Label drift

Concept drift

A

Concept drift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Gradual drift is used for which of the following?

An old concept that incrementally changes to a new concept over a period of time

A new concept that occurs within a short time

An old concept that may reoccur after some time

A new concept that rapidly replaces an old one over a short period of time

A

An old concept that incrementally changes to a new concept over a period of time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which component identifies anomalies in training and serving data and can automatically create a schema by examining the data?

Data validation

Data ingestion

Data identifier

Data transform

A

Data validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is training skew caused by?

The Cloud Storage you load your data from in the training environment is physically closer than the Cloud Storage you load your data from in the production environment.

Your development and production environments are different, or different code is used in the training environment than in the development environment.

Starting and stopping of the processing when training the model.

The prediction environment is slower than the training environment.

A

Your development and production environments are different, or different code is used in the training environment than in the development environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Which of the following tools help software users manage dependency issues?

Monolithic programs

Modular programs

Polylithic programs

Maven, Gradle, and Pip

A

Maven, Gradle, and Pip

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

If each of your examples is large in terms of size and requires parsing, and your model is relatively simple and shallow, your model is likely to be:
check

I/O bound, so you should look for ways to store data more efficiently and ways to parallelize the reads.

CPU-bound, so you should use GPUs or TPUs.

Latency-bound, so you should use faster hardware

A

I/O bound, so you should look for ways to store data more efficiently and ways to parallelize the reads.

17
Q

What does high-performance machine learning determine?

Time taken to train a model

Reliability of a model

Deploying a model

Training a model

A

Time taken to train a model

18
Q

For the fastest I/O performance in TensorFlow… (check all that apply)

Read in parallel threads.

Optimize TensorFlow performance using the Profiler.

Read TF records into your model.

Prefetch the data

A

All of them:

Read in parallel threads.

Optimize TensorFlow performance using the Profiler.

Read TF records into your model.

Prefetch the data

19
Q

Which of the following indicates that ML training is CPU bound?

If I/O is complex, but the model involves lots of complex/expensive computations.

If you are running a model on powered hardware.

If I/O is simple, but the model involves lots of complex/expensive computations.

If you are running a model on accelerated hardware.

A

If I/O is simple, but the model involves lots of complex/expensive computations.

20
Q

Which of the following determines the correct property of Tensorflow Lite?

Quantization

Higher precision arithmetic

Increased code footprint

Lower precision arithmetic

A

Quantization

21
Q

To copy the input data into TensorFlow, which of the following syntaxes is correct?

inferenceInterface.feed(inputName, floatValues, 1, inputSize; inputSize, 3);

inferenceInterface.feed(inputName, floatValues, 1, inputSize, inputSize, 3);

inferenceInterface.feed(floatValues, 1, inputSize, inputSize, 3);

inferenceInterface.feed(inputName, floatValues, 1, inputSize, 3);

A

inferenceInterface.feed(inputName, floatValues, 1, inputSize, inputSize, 3);

22
Q

Which of these are reasons that you may not be able to perform machine learning solely on Google Cloud? Check all that apply.

You need to run inference on the edge.

TensorFlow is not supported on Google Cloud.

You are tied to on-premises or multi-cloud infrastructure due to business reasons.

A

You need to run inference on the edge.

You are tied to on-premises or multi-cloud infrastructure due to business reasons.

23
Q

A key principle behind Kubeflow is portability so that you can:

Migrate your model from TensorFlow to PyTorch.

Convert your model from CUDA to XLA.

Move your model from on-premises to Google Cloud.

A

Move your model from on-premises to Google Cloud.