Production Machine Learning Systems Flashcards
Which type of training do you use if your data set doesn’t change over time?
Online training
Dynamic training
Real-time training
Static training
Static training
In the featurestore, the timestamps are an attribute of the feature values, not a separate resource type.
False
True
True
When you use the data to train a model, Vertex AI examines the source data type and feature values and infers how it will use that feature in model training. This is called the ________________for that feature.
Transformation
Transmutation
Translation
Duplication
Transformation
Match the three types of data ingest with an appropriate source of training data.
Streaming (BigQuery), structured batch (Pub/Sub), unstructured batch (Cloud Storage)
Streaming batch (Dataflow), structured batch (BigQuery), stochastic (App Engine)
Streaming (Pub/Sub), structured batch (BigQuery), unstructured batch (Cloud Storage)
Streaming (Pub/Sub), structured batch (BigQuery), unstructured batch (Cloud Storage)
What is the responsibility of model evaluation and validation components?
To ensure that the models are not good after moving them into a staging environment.
To ensure that the models are not good before moving them into a staging environment.
To ensure that the models are good after moving them into a production/staging environment.
To ensure that the models are good before moving them into a production/staging environment.
To ensure that the models are good before moving them into a production/staging environment.
What percent of system code does the ML model account for?
90%
25%
50%
5%
5%
Vertex AI has a unified data preparation tool that supports image, tabular, text, and video content. Where are uploaded datasets stored in Vertex AI?
A Google Cloud Storage bucket that acts as an output for both AutoML, custom training jobs, serialized training jobs.
A Google Cloud database that acts as an input for both AutoML and custom training jobs.
A Google Cloud Storage bucket that acts as an input for both AutoML and custom training jobs.
A Google Cloud database that acts as an output for both AutoML and custom training jobs.
A Google Cloud Storage bucket that acts as an input for both AutoML and custom training jobs.
Which type of logging should be enabled in the online prediction that logs the stderr and stdout streams from your prediction nodes to Cloud Logging and can be useful for debugging?
Cloud logging
Container logging
Access logging
Request-response logging
Container logging
Suppose you are building an ML-based system to predict the likelihood that a customer will leave a positive review. The user interface that customers leave reviews on changed a few months ago, but you don’t know about this. Which of these is a potential consequence of mismanaging this data dependency?
Change in ability of model to be part of a streaming ingest
Losses in prediction quality
Change in model serving signature
Losses in prediction quality
Which of the following models are susceptible to a feedback loop? Check all that apply.
A book-recommendation model that suggests novels its users may like based on their popularity (i.e., the number of times the books have been purchased).
Correct! Book recommendations are likely to drive purchases, and these additional sales will be fed back into the model as input, making it more likely to recommend these same books in the future.
A traffic-forecasting model that predicts congestion at highway exits near the beach, using beach crowd size as one of its features.
A housing-value model that predicts house prices, using size (area in square meters), number of bedrooms, and geographic location as features.
A face-attributes model that detects whether a person is smiling in a photo, which is regularly trained on a database of stock photography that is automatically updated monthly.
An election-results model that forecasts the winner of a mayoral race by surveying 2% of voters after the polls have closed.
A university-ranking model that rates schools in part by their selectivity (the percentage of students who applied that were admitted).
- A book-recommendation model that suggests novels its users may like based on their popularity (i.e., the number of times the books have been purchased).
Book recommendations are likely to drive purchases, and these additional sales will be fed back into the model as input, making it more likely to recommend these same books in the future.
- A traffic-forecasting model that predicts congestion at highway exits near the beach, using beach crowd size as one of its features.
Some beachgoers are likely to base their plans on the traffic forecast. If there is a large beach crowd and traffic is forecast to be heavy, many people may make alternative plans. This may depress beach turnout, resulting in a lighter traffic forecast, which then may increase attendance, and the cycle repeats.
-A university-ranking model that rates schools in part by their selectivity (the percentage of students who applied that were admitted).
What is the shift in the actual relationship between the model inputs and the output called?
Prediction drift
Data drift
Label drift
Concept drift
Concept drift
Gradual drift is used for which of the following?
An old concept that incrementally changes to a new concept over a period of time
A new concept that occurs within a short time
An old concept that may reoccur after some time
A new concept that rapidly replaces an old one over a short period of time
An old concept that incrementally changes to a new concept over a period of time
Which component identifies anomalies in training and serving data and can automatically create a schema by examining the data?
Data validation
Data ingestion
Data identifier
Data transform
Data validation
What is training skew caused by?
The Cloud Storage you load your data from in the training environment is physically closer than the Cloud Storage you load your data from in the production environment.
Your development and production environments are different, or different code is used in the training environment than in the development environment.
Starting and stopping of the processing when training the model.
The prediction environment is slower than the training environment.
Your development and production environments are different, or different code is used in the training environment than in the development environment.
Which of the following tools help software users manage dependency issues?
Monolithic programs
Modular programs
Polylithic programs
Maven, Gradle, and Pip
Maven, Gradle, and Pip