GCP MLE Flashcards

1
Q

You are building an ML model to detect anomalies in real-time sensor data. You will use Pub/Sub to handle incoming requests. You want to store the results for analytics and visualization. How should you configure the pipeline?

A. 1 = Dataflow, 2 = AI Platform, 3 = BigQuery
B. 1 = DataProc, 2 = AutoML, 3 = Cloud Bigtable
C. 1 = BigQuery, 2 = AutoML, 3 = Cloud Functions
D. 1 = BigQuery, 2 = AI Platform, 3 = Cloud Storage

A

A is correct as Data from sensors would be ingested to a Pub/Sub topic which would be further pre-processed using DataFlow batch streaming jobs, later AI-Platform Models is used to serve the model detections via Batch Predictions and the results can be stored in Bigquery for analysis and Visualizations using Data Studio or AI-Platform Notebooks.

B is incorrect as Apache Beam SDK used in Dataflow has integrations with Pub/Sub streaming and is recommended with Pub/Sub instead of DataProc, and BigQuery is a better choice for analytics and visualizations.

C is incorrect as Cloud Functions can’t be used for result analysis or visualization.

D is incorrect as Cloud Storage can’t be used for result analysis or visualization.

Note:

You can read JSON-formatted messages from a Pub/Sub topic and write them to a BigQuery table, but the results also are needed to be stored in bigquery for analysis, and using Bigquery you can’t have API calls for model predictions, it can be done using DataFlow jobs, hence C and D can’t be correct.

Links:

Similar problem statement: https://cloud.google.com/architecture/detecting-anomalies-in-financial-transactions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Your team is building an application for a global bank that will be used by millions of customers. You built a forecasting model that predicts customers’ account balances 3 days in the future. Your team will use the results in a new feature that will notify users when their account balance is likely to drop below $25. How should you serve your predictions?

A
1. Create a Pub/Sub topic for each user.
2. Deploy a Cloud Function that sends a notification when your model predicts that a user’s account balance will drop below the $25 threshold.

B
1. Create a Pub/Sub topic for each user.
2. Deploy an application on the App Engine standard environment that sends a notification when your model predicts that a user’s account balance will drop below the $25 threshold.

C
1. Build a notification system on Firebase.
2. Register each user with a user ID on the Firebase Cloud Messaging server, which sends a notification when the average of all account balance predictions drops below the $25 threshold.

D
1. Build a notification system on Firebase.
2. Register each user with a user ID on the Firebase Cloud Messaging server, which sends a notification when your model predicts that a user’s account balance will drop below the $25 threshold.

A

A is incorrect, this is a viable solution but to send the messages we would be charged for cloud function usage.
B is incorrect, App Engine is costlier than Cloud functions. This is not a cost- effective solution.
C is incorrect, as model prediction results are not involved in this solution.
D is correct, as we can register each user with a user ID on the Firebase Cloud Messaging(FCM) server, which sends a notification when your model predicts that a user’s account balance will drop below the $25 threshold. Using FCM we can send messages at no cost.
Note:
Firebase Cloud Messaging (FCM) is a cross-platform messaging solution that lets you reliably send messages at no cost. Hence we need to use firebase for this application.
Links:
Firebase Cloud Messaging (FCM): https://firebase.google.com/docs/cloud- messaging

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

You work for an advertising company and want to understand the effectiveness of your company’s latest advertising campaign. You have streamed 500 MB of campaign data into BigQuery. You want to query the table, and then manipulate the results of that query with a pandas dataframe in an AI Platform notebook. What should you do?

A
Use AI Platform Notebooks’ BigQuery cell magic to query the data, and ingest the results as a pandas dataframe.

B
Export your table as a CSV file from BigQuery to Google Drive, and use the Google Drive API to ingest the file into your notebook instance.

C
Download your table from BigQuery as a local CSV file, and upload it to your AI Platform notebook instance. Use pandas.read_csv to ingest he file as a pandas dataframe.

D
From a bash cell in your AI Platform notebook, use the bq extract command to export the table as a CSV file to Cloud Storage, and then use gsutil cp to copy the data into the notebook. Use pandas.read_csv to ingest the file as a pandas dataframe.

A

A is correct, BigQuery cell magic to query the data can be used to ingest
data into the pandas dataframe (refer link).
B is incorrect, this is a method with redundant steps and we want to only manipulate results of the query, not the entire table.
C is incorrect, as you want to only manipulate results of the query and not the entire table, hence this solution is not suitable.
D is incorrect, again this is a redundant multi-step method.
Note:
Using python bigquery client library to convert the query to dataframe is the most preferable solution, but it is not mentioned in the options.
Links:
BigQuery cell magic to query the data:
https://cloud.google.com/bigquery/docs/visualize-jupyter
Pandas dataframe using bigquery python client library:
https://cloud.google.com/bigquery/docs/visualize-jupyter#querying-and- visualizing-bigquery-data-using-pandas-dataframes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

You are an ML engineer at global car manufacture. You need to build an ML model to predict car sales in different cities around the world. Which features or feature crosses should you use to train city-specific relationships between car type and the number of sales?

A
Thee individual features: binned latitude, binned longitude, and one-hot encoded car type.

B
One feature obtained as an element-wise product between latitude, longitude, and car type.

C
One feature obtained as an element-wise product between (Correct) binned latitude, binned longitude, and one-hot encoded car type.

D
Two feature crosses as an element-wise product: the first between binned latitude and one-hot encoded car type, and the second between binned longitude and one-hot encoded car type.

A

Note:
Kindly watch the Feature cross video before going through the solution.

A is incorrect, as here we won’t be creating features that represent regional car types, all three would be independent features.
B is incorrect, as would be using binned latitude and longitudes to capture city-specific features.
C is correct, crossing all three binned latitude, binned longitude and car types is necessary to capture region-specific car type information.
D is incorrect, creating different features with latitude and longitude will not be able to capture regional information.
Links:
Feature cross: https://developers.google.com/machine-learning/crash- course/feature-crosses/video-lecture

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

You work for a large technology company that wants to modernize their contact center. You have been asked to develop a solution to classify incoming calls by product so that requests can be more quickly routed to the correct support team. You have already transcribed the calls using the Speech-to-Text API. You want to minimize data preprocessing and development time. How should you build the model?

A
Use the AI Platform Training built-in algorithms to create a custom model.

B
Use AutoMlL Natural Language to extract custom entities for classification.

C
Use the Cloud Natural Language API to extract custom entities for classification.

D
Build a custom model to identify the product keywords from the transcribed calls, and then run the keywords through a classification algorithm.

A

A is incorrect, AI Platform doesn’t have built-in algorithms for NLP yet. (only BERT is available). But this is also a viable solution, if something like tensorflow requirement was mentioned in the question then this would have been the correct answer.
B is correct, AutoML natural language AI’s model can be trained for custom entities, based on those entities classification can be made for the incoming calls using a predefined lookup which would map entities to the product. (Actual automated IVRs are more sophisticated, but such details have not mentioned in the question, and the version mentioned in the question is a usable baseline solution)
C is incorrect, as Natural Language API will return generic entities which would not be specific to the call center.
D is incorrect, this would need a lot of development time.
Links:
AutoML Natural Language AI: https://cloud.google.com/natural- language/automl/docs
AI Platform built-in algorithms (no NLP algorithms available yet):
https://cloud.google.com/ai-platform/training/docs/algorithms
Automated IVRs (kindly research more on these lines if interested):
http://www.smartcustomerservice.com/Columns/Vendor-Views/Building-a- More-Intelligent-IVR-Through-Machine-Learning-130467.aspx

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

You are training a TensorFlow model on a structured dataset with 100 billion records stored in several CSV files. You need to improve the input/output execution performance. What should you do?

A
Load the data into BigQuery, and read the data from BigQuery.

B
Load the data into Cloud Bigtable, and read the data from Bigtable.

C
Convert the CSV files into shards of TFRecords, and store the data in Cloud Storage.

D
Convert the CSV files into shards of TFRecords, and store the data in the Hadoop Distributed File System (HDFS).

A

A is correct, BigQuery is recommended by Google to store and manipulate
structured data with over 100 billion rows, and the model can be trained with tensorflow using BigQuery TensorFlow reader. Hence, this is the best solution. (refer links)
B is incorrect, BigTable is for NoSQL data (which stores data as key-value pairs).
C is incorrect, converting the data to tfrecords will optimize the training, but bigquery here is the better option to query the structured dataset, rather than loading them using tf.data API.
D is incorrect, HDFS is not needed since we have a Cloud Storage bucket. And the reason for tfrecords is explained in option C.
Links:
Anatomy of a BigQuery Query:
https://cloud.google.com/blog/products/bigquery/anatomy-of-a-bigquery- query
End to end example for BigQuery TensorFlow reader:
https://www.tensorflow.org/io/tutorials/bigquery

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

As the lead ML Engineer for your company, you are responsible for building ML models to digitize scanned customer forms. You have developed a TensorFlow model that converts the scanned images into text and stores them in Cloud Storage. You need to use your ML model on the aggregated data collected at the end of each day with minimal manual intervention. What should you do?

A
Use the batch prediction functionality of AI Platform.

B
Create a serving pipeline in Compute Engine for prediction.

C
Use Cloud Functions for prediction each time a new data point is ingested.

D
Deploy the model on AI Platform and create a version of it for online inference.

A

A is correct, using AI-Platform batch-prediction you can specify the output
directory for storing the results, all the results would be stored there. There is no real time requirement mentioned in the question, so batch-prediction can be performed at the end of each day for inferring the results in a scalable way and aggregating them at a provided output path.

B is incorrect, this is the most inefficient way and not scalable.

C is incorrect, no such real time requirement is mentioned in the question, hence aloud functions would not be required here.

D is incorrect, if we use Online Prediction then we need to write the code to aggregate the results to a certain location , which is automatically done in batch predictions.

Note:

The path to the Cloud Storage location where you want the prediction service to save your results.

Links:
AI Platform Batch Predictions (refer output path): https://cloud.google.com/ai- platform/prediction/docs/batch-predict

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

You recently joined an enterprise-scale company that has thousands of datasets. You know that there are accurate descriptions for each table in BigQuery, and you are searching for the proper BigQuery table to use for a model you are building on AI Platform. How should you find the data that you need?

A
Use Data Catalog to search the BigQuery datasets by using keywords in the table description.

B
Tag each of your model and version resources on AI Platform with the name of the BigQuery table that was used for training.

C
Maintain a lookup table in BigQuery that maps the table descriptions to the table ID. Query the lookup table to find the correct table ID for the data that you need.

D
Execute a query in BigQuery to retrieve all the existing table names in your project using the INFORMATION_SCHEMA metadata tables that are native to BigQuery. Use the result o find the table that you need.

A

A is correct, Data Catalog offers powerful, structured search capabilities and
predicate-based filtering over both the technical and business metadata from Bigquery. Data catalog provides APIs in various languages to search the datasets. (refer link)
B is incorrect, in order to tag the BigQuery Table with the model or version resource on AI Platform you will first need to search the tables. The Search step is not mentioned in this option.
C is incorrect, again to create the lookup table, tables would need to be searched first, the process of creating a lookup table isn’t mentioned here.
D is incorrect, you would receive the metadata for all the tables using this method. But exactly how to use the results of over 1000 tables to search for the required table is not mentioned in this answer. (refer link for details about INFORMATION_SCHEMA)
Links:
Data Catalog overview: https://cloud.google.com/data- catalog/docs/concepts/overview
How to search using Data Catalog: https://cloud.google.com/data- catalog/docs/how-to/search
Getting table metadata using INFORMATION_SCHEMA:
https://cloud.google.com/bigquery/docs/information-schema-tables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

You started working on a classification problem with time-series data and achieved an area under the receiver operating characteristic curve (AUC ROC) value of 99% for training data after just a few experiments. You haven’t explored using any sophisticated algorithms or spent any time on hyperparameter tuning. What should your next step be to identify and fix the problem?

A
Address the model overfitting by using a less complex algorithm.

B
Address data leakage by applying nested cross-validation during model training.

C
Address data leakage by removing features highly correlated with the target value.

D
Address the model overfitting by tuning the hyperparameters to reduce the AUC ROC value.

A

This is not the data leakage problem as the details on how the data is split is not mentioned, if data is splitted randomly and not w.r.t. time for a time-series problem, then it becomes a data leakage problem. Here we assume data is splitted correctly and since there is 99% ROC-AUC on training data and not on the validation data this is an overfitting issue.
A is correct, as this is an overfitting issue, the first thing we do is reduce the model complexity by either using a less complex model or using L1 regularization/dropout.
B is incorrect, as mentioned in the note, this is not a data leakage problem. C is incorrect, as mentioned in the note, this is not a data leakage problem.
D is incorrect, this is a viable solution, but in the question, it is mentioned that 99% ROC AUC is obtained just after a few experiments, so the first thing which needs to be done is to reduce the model complexity.
Links:
Overfitting explained: https://elitedatascience.com/overfitting-in-machine- learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

You work for an online travel agency that also sells advertising placements on its website to other companies. You have been asked to predict the most relevant web banner that a user should see next. Security is important to your company. The model latency requirements are 300ms, the inventory is thousands of web banners, and your exploratory analysis has shown that navigation context is a good predictor. You want to Implement the simplest solution. How should you configure the prediction pipeline?

A.
Embed the client on the website, and then deploy the model on AI Platform Prediction.

B
Embed the client on the website, deploy the gateway on App Engine, and then deploy the model on AI Platform Prediction.

C
Embed the client on the website, deploy the gateway on App Engine, deploy the database on Cloud Bigtable for writing and for reading the user’s navigation context, and then deploy the model on AI Platform Prediction.

D
Embed the client on the website, deploy the gateway on App Engine, deploy the database on Memorystore for writing and for reading the user’s navigation context, and then deploy the model on Google Kubernetes Engine.

A

The question asks for the simplest solution, so the module required to make predictions using AI Platform Online Prediction can be embedded within the client application itself, since the AI Platform Online Prediction model provides the REST API. If there are not many network steps involved for the prediction process the latency would be low. Amongst all the options mentioned below option A is the simplest and most viable solution for lowest latency, since there is just one network communication step involved. And current navigation context would be available at the client end itself
A is correct, as mentioned in the note, this is the simplest and most viable solution for low latency. The only latency component in this solution is the response time for Online Prediction.
B is incorrect, there are many steps involved in this solution, option A is a much simpler solution.
C is incorrect, there are many steps involved in this solution, option A is a much simpler solution. (database steps are not required, since only a banner is needed to show to the user on given instances)
D is incorrect, there are many steps involved in this solution, option A is a much simpler solution. This is actually a more viable solution than option A, since memorystore is mentioned, it can be used to cache the current navigation context of the user if due to some reason this can’t be done at the client, but here prediction is mentioned over GKE, which is a complex step. Hence, discarded.
Links:
AI Platform online prediction (refer REST API): https://cloud.google.com/ai- platform/prediction/docs/online-predict

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Your team is building a convolutional neural network (CNN)-based architecture from scratch. The preliminary experiments running on your on-premises CPU-only infrastructure were encouraging, but have slow convergence. You have been asked to speed up model training to reduce time-to-market. You want to experiment with virtual machines (VMs) on Google Cloud to leverage more powerful hardware. Your code does not include any manual device placement and has not been wrapped in Estimator model-level abstraction. Which environment should you train your model on?

A
AVM on Compute Engine and 1 TPU with all dependencies installed manually.

B
AVM on Compute Engine and 8 GPUs with all dependencies installed manually.

C
A Deep Learning VM with an n1-standard-2 machine and 1 GPU with all libraries pre-installed.

D
A Deep Learning VM with more powerful CPU e2-highcpu-16 machines with all libraries pre-installed.

A

Since manual device placements are not written in code, nor the code is wrapped with estimator or keras model level abstraction, we have to go with faster CPUs. GPU and TPU based solutions wouldn’t be viable.
A is incorrect, as a TPU based solution.
B is incorrect, as a GPU based solution.
C is incorrect, as a GPU based solution.
D is correct, as this is a CPU based solution and Deep Learning VM has all the required ML libraries pre-installed.
Links:
GPU manual device placement:
https://www.tensorflow.org/guide/gpu#manual_device_placement
TPU manual device placement:
https://www.tensorflow.org/guide/tpu#manual_device_placement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

You work on a growing team of more than 50 data scientists who all use AI Platform. You are designing a strategy to organize your jobs, models, and versions in a clean and scalable way. Which strategy should you choose?

A
Set up restrictive IAM permissions on the AI Platform notebooks so that only a single user or group can access a given instance.

B
Separate each data scientist’s work into a different project to ensure that the jobs, models, and versions created by each data scientist are accessible only to that user.

C
Use labels to organize resources into descriptive categories.
Apply a label to each created resource so that users can filter the results by label when viewing or monitoring the resources.

D
Set up a BigQuery sink for Cloud Logging logs that is appropriately filtered to capture information about AI Platform resource usage. In BigQuery, create a SQL view that maps users to the resources they are using

A

A is incorrect, as a strategy to organize your jobs, models, and versions in a clean and scalable way is asked for, nothing about Notebooks is mentioned in the question.
B is incorrect, in case Data scientists need to collaborate over building a model, then distributing them over multiple projects would be restrictive in the development process.
C is correct, AI Platform provides the labels to organize and filter your resources. You can label jobs by team/user and development phase (prod or test), then filter the jobs based on the team and phase.
D is incorrect, this is a viable solution. You can export all the AI Platform logs to BigQuery and write custom queries to map users to the resources they are using. This would provide an organized view over the AI Platform resources, but option C is a better solution as it provides in-built filtering, no custom table views are needed.
Links:
Labelling resources on AI-Platform: https://cloud.google.com/ai- platform/training/docs/resource-labels
Bigquery logging sink: https://cloud.google.com/logging/docs/export/bigquery

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

You are training a deep learning model for semantic image segmentation with reduced training time. While using a Deep Learning VM Image, you receive the following error:
The resource ‘projects/deeplearning-platforn/zones/europe-west4-
c/acceleratorTypes/nvidia-tesla-k80’ was not found
What should you do?

A
Ensure that you have GPU quota in the selected region.

B
Ensure that the required GPU is available in the selected region.

C
Ensure that you have preemptible GPU quota in the selected region.

D
Ensure that the selected GPU has enough GPU memory for the workload.

A

Go through the troubleshooting link first.
A is incorrect, as this is a resource not found issue, not quota exceeded issue.
B is correct, as this is a resource not found the issue. Hence one should determine which region has the required GPU.
C is incorrect, this is not a preemptible instance quote issue.
D is incorrect, this is the solution for ResourceExhaustedError. (ResourceExhaustedError generally occurs when batch-size is too high for the given machine)
Links:
Troubleshooting Deep learning VMs: https://cloud.google.com/deep-learning- vm/docs/troubleshooting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Your team is working on an NLP research project to predict the political affiliation of authors based on articles they have written. You have a large training dataset that is structured like this:

AuthorA:Political Party A
TestA1: [SentenceA11, SentenceA12, …]
TextA2: [SentenceA21, SentenceA22, …]

A

You want to predict the political affiliation of authors based on articles they have written. Thus we have to classify the author to the political party based on the text. Thus, splitting the dataset based on the author makes the most sense (option B). Let’s say we split data based on the texts (option A), then let’s say TextA1 goes train and TextA2 will go to Validation, now the model would correctly predict TextA2 as affiliated to Political Party A based on the writing patterns of the user and not the actual context in the text. This type of problem is a common issue in splitting medical datasets, hence medical data is always split as per the patients. This issue is ignored during Question framing and the split ratio is given more importance here, hence the correct answer would differ from this solution.
A is incorrect, as per the note.
B is correct, as per the note.
C is incorrect, as this would introduce author-specific bias too. But this would be the Correct answer if the split ratio is considered more important than the author-specific bias which would lead to data leakage during evaluation.
D is incorrect, paragraphs of texts are not mentioned to have in special significance as per the question.
Links:
Splitting medical dataset: https://www.coursera.org/lecture/ai-for-medical- diagnosis/splitting-data-by-patient-cQr8S

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Your team has been tasked with creating an ML solution in Google Cloud to classify support requests for one of your platforms. You analyzed the requirements and decided to use TensorFlow to build the classifier so that you have full control of the model’s code, serving, and deployment. You will use Kubeflow pipelines for the ML platform. To save time, you want to build on existing resources and use managed services instead of building a completely new model. How should you build the classifier?

A
Use the Natural Language API to classify support requests.

B
Use AutoML Natural Language to build the support requests classifier.

C
Use an established text classification model on AI Platform to perform transfer learning.

D
Use an established text classification model on AI Platform as-is to classify support requests.

A

A is incorrect, the user needs the model’s code, serving, and deployment which is not possible at all with Natural Language API, since it is a pre-built API with pre-built generic entities.
B is incorrect, as tensorflow is mentioned in the question and the user needs the model’s code, serving, and deployment which is not possible using AutoML.
C is incorrect, transfer learning need is not mentioned anywhere in the question, if something like less data is available as mentioned in the question, then transfer learning would have been a correct choice.
D is correct, this is the most viable solution if the user wants to use tensorflow and needs control over the model’s code, serving, and deployment. But multiple NLP models are not supported by AI-Platform yet and will be available soon. (checkout BERT in links)
Links:
AI Platform BERT: https://cloud.google.com/ai- platform/training/docs/algorithms/bert

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

You recently joined a machine learning team that will soon release a new project. As a lead on the project, you are asked to determine the production readiness of the ML components. The team has already tested features and data, model development, and infrastructure. Which additional readiness check should you recommend to the team?

A
Ensure that training is reproducible.

B
Ensure that all hyperparameters are tuned.

C
Ensure that model performance is monitored.

D
Ensure that feature expectations are captured in the schema.

A

A is incorrect, as we need to ensure that the training is reproducible since to cope up with data drift with time, the model retraining would be required eventually. Hence this is the most important component. But this is already included in infrastructure testing.
B is incorrect, this is also important but included in the test for model development.
C is correct, as your ML system working correctly at launch, needs to continue working correctly over time, Hence monitoring is most important in production along with tests for features and data, model development, and infrastructure.
D is incorrect, as features are already tested by the ML team.
Links:
A Rubric for ML Production Readiness and Technical Debt Reduction (Paper), (Refer section V: Monitoring): https://storage.googleapis.com/pub-tools- public-publication- data/pdf/aad9f93b86b7addfea4c419b9100c6cdd26cacea.pdf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

You work for a credit card company and have been asked to create a custom fraud detection model based on historical data using AutoML Tables. You need to prioritize the detection of fraudulent transactions while minimizing false positives. Which optimization objective should you use when training the model?

A
An optimization objective that minimizes Log loss

B
An optimization objective that maximizes the Precision at a Recall value of 0.50

C
An optimization objective that maximizes the area under the precision- recall curve (AUC PR) value

D
An optimization objective that maximizes the area under the receiver operating characteristic curve (AUC ROC) value

A

As per the question it is mentioned that we need to maximize the values of True positives and Minimize the False Positives. Nothing is mentioned about the false negatives. Hence, we need to increase the Precision (TP/(TP+FP)).
A is incorrect, an algorithm would be needed that minimizes log loss (i.e. binary cross-entropy) as this is a binary classification problem, but this wouldn’t determine the increase in precision.
B is correct, as there is a precision-recall trade-off when we reduce recall, precision would increase, thus we can keep recall to 0.5 and try to improve precision since false negatives aren’t a concern.
C is incorrect, again False Negative is not prioritized in the question, hence discarding this option. But this is an ideal solution, but difficult to achieve.
D is incorrect, AUC-ROC is the plot of sensitivity vs recall, this should be plotted when minimizing False Negatives is a priority.
Links:
Precision-recall: https://scikit- learn.org/stable/auto_examples/model_selection/plot_precision_recall.html
ROC-AUC: https://scikit- learn.org/stable/auto_examples/model_selection/plot_roc.html

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Your company manages a video sharing website where users can watch and upload videos. You need to create an ML model to predict which newly uploaded videos will be the most popular so that those videos can be prioritized on your company’s website. Which result should you use to determine whether the model is successful?

A
The model predicts videos as popular if the user who uploads them has over 10,000 likes.

B
The model predicts 97.5% of the most popular clickbait videos measured by number of clicks.

C
The model predicts 95% of the most popular videos measured by watch time within 30 days of being uploaded.

D
The Pearson correlation coefficient between the log-transformed number of views after 7 days and 30 days after publication is equal to 0.

A

A is incorrect, all videos don’t receive likes uniformly, so this isn’t a good metric.
B is incorrect, as actual watch time is a more important factor than the video being just clicked to be watched.
C is correct, the videos which are watched for the most time for 30 days can be considered as popular videos.
D is incorrect, if the video is unpopular after 7 days and unpopular after 30 days then too Pearson correlation coefficient would be zero, so this is not a correct option.
Note:
YouTube uses a metric called view velocity, which measures the number of subscribers who watch your video right after it’s published. And the higher your video’s view velocity, the higher your videos will rank. But, this option isn’t available.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

You are working on a Neural Network-based project. The dataset provided to you has columns with different ranges. While preparing the data for model training, you discover that gradient optimization is having difficulty moving weights to a good solution. What should you do?

A
Use feature construction to combine the strongest features.

B
Use the representation transformation (normalization) technique.

C
Improve the data cleaning step by removing features with missing values.

D
Change the partitioning step to reduce the dimension of the test set and have a larger training set.

A

Since the dataset provided to you has columns with different ranges, hence all the features must be normalized to a fixed range. This is known as feature scaling.
A is incorrect, it is not mentioned that the model is not converging due to an excessive number of features. Model mostly overfits in this situation. Then one could try feature selection and then use dropout.
B is correct, Normalization would be needed to scale all features to a fixed range, thus making the model converge better on your data.
C is incorrect, as nothing is mentioned about the missing values in the question.
D is incorrect, again nothing is mentioned about data splitting.
Links:
Scaling for neural networks: https://machinelearningmastery.com/how-to- improve-neural-network-stability-and-modeling-performance-with-data- scaling/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Your data science team needs to rapidly experiment with various features, model architectures, and hyperparameters. They need to track the accuracy metrics for various experiments and use an API to query the metrics over time. What should they use to track and report their experiments while minimizing manual effort?

A
Use Kubeflow Pipelines to execute the experiments. Export the
metrics file, and query the results using the Kubeflow Pipelines API.

B
Use AI Platform Training to execute the experiments. Write the accuracy metrics to BigQuery, and query the results using the BigQuery API.

C
Use AI Platform Training to execute the experiments. Write the accuracy metrics to Cloud Monitoring, and query the results using the Monitoring API.

D
Use AI Platform Notebooks to execute the experiments. Collect the results in a shared Google Sheets file, and query the results using the Google Sheets API.

A

A is correct, You can view the uploaded metrics as a visualization in the Runs
page for a particular experiment in the Kubeflow Pipelines UI.
B is incorrect, you can write accuracy metrics to BigQuery and then use BQ APIs to compare the model results with different hyper-parameters. This is also a viable solution, but this is only discarded since the Kubeflow option was provided.
C is incorrect, monitoring is used for infrastructure and model performance monitoring over time and not for comparing the results.
D is incorrect, this is also a viable solution, but you must directly write the results to Google Sheet using APIs and not manually collect them and write to the sheet.
Links:
Kubeflow metrics:
https://www.kubeflow.org/docs/components/pipelines/sdk/pipelines-metrics/

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

You work for a bank and are building a random forest model for fraud detection. You have a dataset that includes transactions, of which 1% are identified as fraudulent. Which data transformation strategy would likely improve the performance of your classifier?

A. Write your data in TFRecords.

B. Z-normalize all the numeric features.

C. Oversample the fraudulent transaction 10 times.

D. Use one-hot encoding on all categorical features.

A

Note:
You have a dataset that includes transactions, of which 1% are identified as fraudulent. Thus this is a data imbalance problem. To handle data imbalance we use techniques like: undersampling, oversampling, ensemble modeling, augmentations, probabilistic drop during minibatch training, etc.
A is incorrect, tfrecords are used for input pipeline speed optimization in Tensorflow, not for data imbalance.
B is incorrect, z-normalization is used to handle feature scaling requirements, not for data imbalance.
C is correct, oversampling fraudulent class data will reduce data imbalance. D is incorrect, this is an encoding technique, nothing to do with data
imbalance.
Links:
Credit card fraud detection with Data imbalance:
https://towardsdatascience.com/how-to-build-a-machine-learning-model-to- identify-credit-card-fraud-in-5-stepsa-hands-on-modeling-5140b3bd19f1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Your team is using a TensorFlow Inception-v3 CNN model pretrained on ImageNet for an image classification prediction challenge on 10,000 images. You will use the AI Platform to perform the model training. What TensorFlow distribution strategy and AI Platform training job configuration should you use to train the model and optimize for wall-clock time?

A. Default Strategy; Custom tier with a single master node and four v100 GPUs.

B. One Device Strategy; Custom tier with a single master node and four v100 GPUs.

C. One Device Strategy; Custom tier with a single master node and eight v100 GPUs.

D. MirroredStrategy; Custom tier with a single master node and four v100 GPUs.

A

A is not correct because Default Strategy does not distribute training across multiple devices.
B is not correct because the One Device Strategy does not distribute training across multiple devices.
C is not correct because the One Device Strategy does not distribute training across multiple devices.
D is correct because this is the only strategy that can perform distributed training; albeit there is only a single copy of the variables on the CPU host.
https://www.tensorflow.org/guide/distributed_training

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

You work for a manufacturing company that owns a high-value machine that has several machine settings and multiple sensors. A history of the machine’s hourly sensor readings and known failure event data is stored in BigQuery. You need to predict if the machine will fail within the next 3 days in order to schedule maintenance before the machine fails. Which data preparation and model training steps should you take?

A. Data preparation: Daily max value feature engineering; Model training: AutoML classification with BQML

B. Data preparation: Daily min value feature engineering; Model training: Logistic regression with BQML and AUTO_CLASS_WEIGHTS set to True

C. Data preparation: Rolling average feature engineering; Model training: Logistic regression with BQML and AUTO_CLASS_WEIGHTS set to False

D. Data preparation: Rolling average feature engineering; Model
training: Logistic regression with BQML and (Correct) AUTO_CLASS_WEIGHTS set to True

A

A is not correct because a rolling average is a better feature engineering technique, as it will smooth out the noise and fluctuation in the data to demonstrate whether there is a trend. Using the max value could be an artifact of some noise and may not capture the trend accurately.

B is not correct because a rolling average is a better feature engineering technique, as it will smooth out the noise and fluctuation in the data to demonstrate whether there is a trend. Using the min value could be an artifact of some noise and may not capture the trend accurately.

C is not correct because the model training does not balance class labels for an imbalanced dataset.

D is correct because it uses the rolling average of the sensor data and balances the weights using the BQML auto class weight balance parameter.

https://cloud.google.com/dataprep/docs/html/ROLLINGAVERAGE- Function_57344753
https://cloud.google.com/dataprep/docs/html/AVERAGE-Function_57344661
https://cloud.google.com/bigquery-ml/docs/reference/standard- sql/bigqueryml-syntax-create
https://en.wikipedia.org/wiki/Precision_and_recall https://en.wikipedia.org/wiki/Sensitivity_and_specificity https://en.wikipedia.org/wiki/Moving_average

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

You need to build an object detection model for a small startup company to identify if and where the company’s logo appears in an image. You were given a large repository of images, some with logos and some without. These images are not yet labeled. You need to label these pictures, and then train and deploy the model. What should you do?

A. Use Google Cloud’s Data Labelling Service to label your data. Use AutoML Object Detection to train and deploy the model.

B. Use Vision API to detect and identify logos in pictures and use it as a label. Use AI Platform to build and train a convolutional neural network.

C. Create two folders: one where the logo appears and one where it doesn’t. Manually place images in each folder. Use AI Platform to build and train a convolutional neural network.

D. Create two folders: one where the logo appears and one where it doesn’t. Manually place images in each folder. Use AI Platform to build and train a real-time object detection model.

A

A is correct as this will allow you to easily create a request for a labeling task
and deploy a high-performance model.

B is not correct because Vision API is not guaranteed to work with any company logos, and in the statement, it explicitly mentions a small startup, which will further decrease the chance of success.

C is not correct because the task of manually labelling the data is time consuming and should be avoided if possible.

D is not correct because the task of labeling object detection data is very tedious, and real-time object detection is designed detecting objects in videos rather than in images.
https://cloud.google.com/ai-platform/data-labeling/docs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

You are developing an application on Google Cloud that will automatically generate subject labels for users’ blog posts. You are under competitive pressure to add this feature quickly, and you have no additional developer resources. No one on your team has experience with machine learning. What should you do?

A. Call the Cloud Natural Language API from your application. Process the generated Entity Analysis as labels.

B. Call the Cloud Natural Language API from your application. Process the generated Sentiment Analysis as labels.

C. Build and train a text classification model using TensorFlow. Deploy the model using AI Platform Prediction. Call the model from your application and process the results as labels.

D. Build and train a text classification model using TensorFlow. Deploy the model using a Kubernetes Engine cluster. Call the model from your application and process the results as labels.

A

A is correct because it provides a managed service and a fully trained model,
and the user is pulling the entities, which is the right label.

B is not correct because sentiment is the incorrect label for this use case.

C is not correct because this requires experience with machine learning.

D is not correct because this requires experience with machine learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

You are developing an application on Google Cloud that will label famous landmarks in users’ photos. You are under competitive pressure to develop a predictive model quickly. You need to keep service costs low. What should you do?

A. Build an application that calls the Cloud Vision API. Inspect the generated MID values to supply the image labels.

B. Build an application that calls the Cloud Vision API. Pass landmark location as base64-encoded strings.

C. Build and train a classification model with TensorFlow. Deploy the model using AI Platform Prediction. Pass client image locations as base64-encoded strings.

D. Build and train a classification model with TensorFlow. Deploy the model using AI Platform Prediction. Inspect the generated MID values to supply the image labels.

A

B is correct because of the requirement to quickly develop a model that
generates landmark labels from photos.
This is supported in Cloud Vision API; see the link below.

A is not correct because you should not inspect the generated MID values; instead, you should simply pass the image locations to the API and use the labels, which are output.

C, D are not correct because you should not build a custom classification TF model for this scenario.

https://cloud.google.com/vision/docs/labels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Your organization’s marketing team wants to send biweekly scheduled emails to customers that are expected to spend above a variable threshold. This is the first ML use case for the marketing team, and you have been tasked with the implementation. After setting up a new Google Cloud project, you use Vertex AI Workbench to develop model training and batch inference with an XGBoost model on the transactional data stored in Cloud Storage. You want to automate the end-to-end pipeline that will securely provide the predictions to the marketing team, while minimizing cost and code maintenance. What should you do?

A. Create a scheduled pipeline on Vertex AI Pipelines that accesses the data from Cloud Storage, uses Vertex AI to perform training and batch prediction, and outputs a file in a Cloud Storage bucket that contains a list of all customer emails and expected spending.

B. Create a scheduled pipeline on Cloud Composer that accesses the data from Cloud Storage, copies the data to BigQuery, uses BigQuery ML to perform training and batch prediction, and outputs a table in BigQuery with customer emails and expected spending.

C. Create a scheduled notebook on Vertex AI Workbench that accesses the data from Cloud Storage, performs training and batch prediction on the managed notebook instance, and outputs a file in a Cloud Storage bucket that contains a list of all customer emails and expected spending.

D. Create a scheduled pipeline on Cloud Composer that accesses the data from Cloud Storage, uses Vertex AI to perform training and batch prediction, and sends an email to the marketing team’s Gmail group email with an attachment that contains an encrypted list of all customer emails and expected spending.

A

A is correct because Vertex AI Pipelines and Cloud Storage are cost-effective and secure solutions. The solution requires the least number of code interactions because the marketing team can update the pipeline and schedule parameters from the Google Cloud console.

B is not correct. Cloud Composer is not a cost-efficient solution for one pipeline because its environment is always active. In addition, using BigQuery is not the most cost-effective solution.

C is not correct because the marketing team would have to enter the Vertex AI Workbench instance to update a pipeline parameter, which does not minimize code interactions.

D is not correct. Cloud Composer is not a cost-efficient solution for one pipeline because its environment is always active. Also, using email to send personally identifiable information (PII) is not a recommended approach.

https://cloud.google.com/storage/docs/encryption

https://cloud.google.com/vertex-ai/docs/pipelines/run-pipeline

https://cloud.google.com/vertex-ai/docs/workbench/managed/schedule-managed-notebooks-run-quickstart

https://cloud.google.com/arc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

You have developed a very large network in TensorFlow Keras that is expected to train for multiple days. The model uses only built-in TensorFlow operations to perform training with high-precision arithmetic. You want to update the code to run distributed training using tf.distribute.Strategy and configure a corresponding machine instance in Compute Engine to minimize training time. What should you do?

A. Select an instance with an attached GPU, and gradually scale up the machine type until the optimal execution time is reached. Add MirroredStrategy to the code, and create the model in the strategy’s scope with batch size dependent on the number of replicas.

B. Create an instance group with one instance with attached GPU, and gradually scale up the machine type until the optimal execution time is reached. Add TF_CONFIG and MultiWorkerMirroredStrategy to the code, create the model in the strategy’s scope, and set up data autosharding.

C. Create a TPU virtual machine, and gradually scale up the machine type until the optimal execution time is reached. Add TPU initialization at the start of the program, define a distributed TPUStrategy, and create the model in the strategy’s scope with batch size and training steps dependent on the number of TPUs.

D. Create a TPU node, and gradually scale up the machine type until the optimal execution time is reached. Add TPU initialization at the start of the program, define a distributed TPUStrategy, and create the model in the strategy’s scope with batch size and training steps dependent on the number of TPUs.

A

A is not correct because it is suboptimal in minimizing execution time for model training. MirroredStrategy only supports multiple GPUs on one instance, which may not be as performant as running on multiple instances.

B is correct because GPUs are the correct hardware for deep learning training with high-precision training, and distributing training with multiple instances will allow maximum flexibility in fine-tuning the accelerator selection to minimize execution time. Note that one worker could still be the best setting if the overhead of synchronizing the gradients across machines is too high, in which case this approach will be equivalent to MirroredStrategy.

C is not correct because TPUs are not recommended for workloads that require high-precision arithmetic, and are recommended for models that train for weeks or months.

D is not correct because TPUs are not recommended for workloads that require high-precision arithmetic, and are recommended for models that train for weeks or months. Also, TPU nodes are not recommended unless required by the application.

https://cloud.google.com/tpu/docs/intro-to-tpu#when_to_use_tpus

https://www.tensorflow.org/guide/distributed_training

https://www.tensorflow.org/tutorials/distribute/multi_worker_with_ctl

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

You developed a tree model based on an extensive feature set of user behavioral data. The model has been in production for 6 months. New regulations were just introduced that require anonymizing personally identifiable information (PII), which you have identified in your feature set using the Cloud Data Loss Prevention API. You want to update your model pipeline to adhere to the new regulations while minimizing a reduction in model performance. What should you do?

A. Redact the features containing PII data, and train the model from scratch.

B. Mask the features containing PII data, and tune the model from the last checkpoint.

C. Use key-based hashes to tokenize the features containing PII data, and train the model from scratch.

D. Use deterministic encryption to tokenize the features containing PII data, and tune the model from the last checkpoint.

A

A is not correct because removing features from the model does not keep referential integrity by maintaining the original relationship between records, and is likely to cause a drop in performance.

B is not correct because masking does not enforce referential integrity, and a drop in model performance may happen. Also, tuning the existing model is not recommended because the model training on the original dataset may have memorized sensitive information.

C is correct because hashing is an irreversible transformation that ensures anonymization and does not lead to an expected drop in model performance because you keep the same feature set while enforcing referential integrity.

D is not correct because deterministic encryption is reversible, and anonymization requires irreversibility. Also, tuning the existing model is not recommended because the model training on the original dataset may have memorized sensitive information.

https://cloud.google.com/dlp/docs/transformations-reference#transformation_methods

https://cloud.google.com/dlp/docs/deidentify-sensitive-data

https://cloud.google.com/blog/products/identity-security/next-onair20-security-week-session-guide

https://cloud.google.com/dlp/docs/creating-job-triggers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

You need to train an object detection model to identify bounding boxes around Post-it Notes® in an image. Post-it Notes can have a variety of background colors and shapes. You have a dataset with 1000 images with a maximum size of 1.4MB and a CSV file containing annotations stored in Cloud Storage. You want to select a training method that reliably detects Post-it Notes of any relative size in the image and that minimizes the time to train a model. What should you do?

A. Use the Cloud Vision API in Vertex AI with OBJECT_LOCALIZATION type, and filter the detected objects that match the Post-it Note category only.

B. Upload your dataset into Vertex AI. Use Vertex AI AutoML Vision Object Detection with accuracy as the optimization metric, early stopping enabled, and no training budget specified.

C. Write a Python training application that trains a custom vision model on the training set. Autopackage the application, and configure a custom training job in Vertex AI.

D. Write a Python training application that performs transfer learning on a pre-trained neural network. Autopackage the application, and configure a custom training job in Vertex AI.

A

A is not correct because the object detection capability of the Cloud Vision API confidently detects large objects within the image and is not the best option to reliably detect sticky notes of any relative size in the image.

B is correct because AutoML is a codeless solution that minimizes time to train and develop the model, and it is capable of detecting bounding boxes up to one percent the length of a side of an image.

C is not correct because creating a custom training job requires more development time than using AutoML does. The extra flexibility of custom training is not required because AutoML achieves state-of-the-art performance even on tiny objects (8-32 pixels). Additionally, training a model from scratch is not expected to be as performant as transfer learning.

D is not correct because creating a custom training job requires more development time than using AutoML does. The extra flexibility of custom training is not required because AutoML achieves state-of-the-art performance even on tiny objects (8-32 pixels).

https://cloud.google.com/vertex-ai/docs/start/training-methods

https://cloud.google.com/vision/automl/docs/beginners-guide#is_the_vision_api_or_automl_the_right_tool_for_me

https://cloud.google.com/vertex-ai/docs/datasets/prepare-image

https://cloud.google.com/vision-ai/docs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

You used Vertex AI Workbench notebooks to build a model in TensorFlow. The notebook i) loads data from Cloud Storage, ii) uses TensorFlow Transform to pre-process data, iii) uses built-in TensorFlow operators to define a sequential Keras model, iv) trains and evaluates the model with model.fit() on the notebook instance, and v) saves the trained model to Cloud Storage for serving. You want to orchestrate the model retraining pipeline to run on a weekly schedule while minimizing cost and implementation effort. What should you do?

A. Add relevant parameters to the notebook cells and set a recurring run in Vertex AI Workbench.

B. Use TensorFlow Extended (TFX) with Google Cloud executors to define your pipeline, and automate the pipeline to run on Cloud Composer.

C. Use Kubeflow Pipelines SDK with Google Cloud executors to define your pipeline, and use Vertex AI pipelines to automate the pipeline to run.

D. Separate each cell in the notebook into a containerised application and use Cloud Workflows to launch each application.

A

A is not correct because Vertex AI Workbench does not provide alerts, and you would have to log in every week to check the pipeline run status. This does not minimize monitoring steps.

B is not correct because Cloud Composer does not provide ML-specific monitoring capabilities. Also, unless many pipelines are hosted in Cloud Composer, this solution is not the most cost-efficient.

C is correct because using the Kubeflow Pipelines SDK is the best practice to orchestrate AI pipelines with modular steps.

D is not correct because this approach requires more effort and does not follow best practices given that Vertex AI pipelines is the more suitable product to run modular containerised AI pipeline steps.

https://cloud.google.com/architecture/ml-on-gcp-best-practices#machine-learning-workflow-orchestration

https://cloud.google.com/vertex-ai/docs/workbench/managed/schedule-managed-notebooks-run-quickstart

https://cloud.google.com/vertex-ai/docs/pipelines/schedule-cloud-scheduler

32
Q

You need to develop an online model prediction service that accesses pre-computed near-real-time features and returns a customer churn probability value. The features are saved in BigQuery and updated hourly using a scheduled query. You want this service to be low latency and scalable and require minimal maintenance. What should you do?

A. 1. Configure Vertex AI Feature Store to automatically import features from BigQuery, and serve them to the model. 2. Deploy the prediction model as a custom Vertex AI endpoint, and enable automatic scaling.

B. 1. Configure a Cloud Function that exports features from BigQuery to Memorystore. 2. Use a custom container on Google Kubernetes Engine to deploy a service that performs feature lookup from Memorystore and performs inference with an in-memory model.

C. 1. Configure a Cloud Function that exports features from BigQuery to Vertex AI Feature Store. 2. Use the online service API from Vertex AI Feature Store to perform feature lookup. Deploy the model as a custom prediction endpoint in Vertex AI, and enable automatic scaling.

D. 1. Configure a Cloud Function that exports features from BigQuery to Vertex AI Feature Store. 2. Use a custom container on Google Kubernetes Engine to deploy a service that performs feature lookup from Vertex AI Feature Store’s online serving API and performs inference with an in-memory model.

A

A is correct because using Vertex AI Feature Store with BigQuery prioritizes low latency, scalability, requires minimal maintenance, and facilitates integration with other Vertex AI services as a fully managed solution.

B is not correct because feature lookup and model inference can be performed in Cloud Function, and using Google Kubernetes Engine increases maintenance.

C is not correct because Vertex AI Feature Store is not as low-latency as Memorystore.

D is not correct because feature lookup and model inference can be performed in Cloud Function, and using Google Kubernetes Engine increases maintenance. Also, Vertex AI Feature Store is not as low-latency as Memorystore.

https://cloud.google.com/architecture/ml-on-gcp-best-practices#model-deployment-and-serving

https://cloud.google.com/vertex-ai/docs/featurestore/overview#benefits

https://cloud.google.com/memorystore/docs/redis/redis-overview

https://cloud.google.com/vertex-ai/docs/featurestore/latest/overview#data_source_prep

33
Q

You are logged into the Vertex AI Pipeline UI and noticed that an automated production TensorFlow training pipeline finished three hours earlier than a typical run. You do not have access to production data for security reasons, but you have verified that no alert was logged in any of the ML system’s monitoring systems and that the pipeline code has not been updated recently. You want to assure the quality of the pipeline results as quickly as possible so you can determine whether to deploy the trained model. What should you do?

A. Use Vertex AI TensorBoard to check whether the training metrics converge to typical values. Verify pipeline input configuration and steps have the expected values.

B. Upgrade to the latest version of the Vertex SDK and re-run the pipeline.

C. Determine the trained model’s location from the pipeline’s metadata in Vertex ML Metadata, and compare the trained model’s size to the previous model.

D. Request access to production systems. Get the training data’s location from the pipeline’s metadata in Vertex ML Metadata, and compare data volumes of the current run to the previous run.

A

A is correct because TensorBoard provides a compact and complete overview of training metrics such as loss and accuracy over time. If the training converges with the model’s expected accuracy, the model can be deployed.

B is not correct because checking input configuration is a good test, but it is not sufficient to ensure that model performance is acceptable. You can access logs and outputs for each pipeline step to review model performance, but it would involve more steps than using TensorBoard.

C is not correct because model size is a good indicator of health but does not provide a complete overview to make sure that the model can be safely deployed. Note that the pipeline’s metadata can also be accessed directly from Vertex AI Pipelines.

D is not correct because data is the most probable cause of this behavior, but it is not the only possible cause. Also, access requests could take a long time and are not the most secure option. Note that the pipeline’s metadata can also be accessed directly from Vertex AI Pipelines.

https://cloud.google.com/vertex-ai/docs/experiments/tensorboard-overview

https://cloud.google.com/vertex-ai/docs/ml-metadata/introduction

https://cloud.google.com/vertex-ai/docs/pipelines/visualize-pipeline

34
Q

You recently developed a custom ML model that was trained in Vertex AI on a post-processed training dataset stored in BigQuery. You used a Cloud Run container to deploy the prediction service. The service performs feature lookup and pre-processing and sends a prediction request to a model endpoint in Vertex AI. You want to configure a comprehensive monitoring solution for training-serving skew that requires minimal maintenance. What should you do?

A. Create a Model Monitoring job for the Vertex AI endpoint that uses the training data in BigQuery to perform training-serving skew detection and uses email to send alerts. When an alert is received, use the console to diagnose the issue.

B. Update the model hosted in Vertex AI to enable request-response logging. Create a Data Studio dashboard that compares training data and logged data for potential training-serving skew and uses email to send a daily scheduled report.

C. Create a Model Monitoring job for the Vertex AI endpoint that uses the training data in BigQuery to perform training-serving skew detection and uses Cloud Logging to send alerts. Set up a Cloud Function to initiate model retraining that is triggered when an alert is logged.

D. Update the model hosted in Vertex AI to enable request-response logging. Schedule a daily DataFlow Flex job that uses Tensorflow Data Validation to detect training-serving skew and uses Cloud Logging to send alerts. Set up a Cloud Function to initiate model retraining that is triggered when an alert is logged.

A

A is correct because Vertex AI Model Monitoring is a fully managed solution for monitoring training-serving skew that, by definition, requires minimal maintenance. Using the console for diagnostics is recommended for a comprehensive monitoring solution because there could be multiple causes for the skew that require manual review.

B is not correct because this solution does not minimize maintenance. It involves multiple custom components that require additional updates for any schema change.

C is not correct because a model retrain does not necessarily fix skew. For example, differences in pre-processing logic between training and prediction can also cause skew.

D is not correct because this solution does not minimize maintenance. It involves multiple components that require additional updates for any schema change. Also, a model retrain does not necessarily fix skew. For example, differences in pre-processing logic between training and prediction can also cause skew.

https://cloud.google.com/architecture/ml-modeling-monitoring-automating-server-data-skew-detection-in-ai-platform-prediction

https://cloud.google.com/vertex-ai/docs/model-monitoring/overview

35
Q

You recently developed a classification model that predicts which customers will be repeat customers. Before deploying the model, you perform post-training analysis on multiple data slices and discover that the model is under-predicting for users who are more than 60 years old. You want to remove age bias while maintaining similar offline performance. What should you do?

A. Perform correlation analysis on the training feature set against the age column, and remove features that are highly correlated with age from the training and evaluation sets.

B. Review the data distribution for each feature against the bucketized age column for the training and evaluation sets, and introduce preprocessing to even irregular feature distributions.

C. Configure the model to support explainability, and modify the input-baselines to include min and max age ranges.

D. Apply a calibration layer at post-processing that matches the prediction distributions of users below and above 60 years old.

A

A is not correct because this approach could lead to large drops in offline performance.

B is correct because this approach compensates for bias directly in the data by enhancing the data distribution of users above 60 years old. Some useful preprocessing steps could be filling null values, bucketizing, clipping outliers, sampling, or even collecting new data.

C is not correct because modifying input baselines will only adjust explainability of model features and not offline model performance.

D is not correct because this approach could add unconscious or implicit bias to the label. This approach is not recommended because it is a brittle solution that fixes the symptom rather than the cause.

https://ai.google/responsibilities/responsible-ai-practices/

https://cloud.google.com/inclusive-ml

https://developers.google.com/machine-learning/crash-course/fairness/types-of-bias

https://developers.google.com/machine-learning/crash-course/classification/prediction-bias

36
Q

You downloaded a TensorFlow language model pre-trained on a proprietary dataset by another company, and you tuned the model with Vertex AI Training by replacing the last layer with a custom dense layer. The model achieves the expected offline accuracy; however, it exceeds the required online prediction latency by 20ms. You want to reduce latency while minimizing the offline performance drop and modifications to the model before deploying the model to production. What should you do?

A. Apply post-training quantization on the tuned model, and serve the quantized model.

B. Apply knowledge distillation to train a new, smaller “student” model that mimics the behavior of the larger, fine-tuned model.

C. Use pruning to tune the pre-trained model on your dataset, and serve the pruned model after stripping it of training variables.

D. Use clustering to tune the pre-trained model on your dataset, and serve the clustered model after stripping it of training variables.

A

A is correct because post-training quantization is the recommended option for reducing model latency when re-training is not possible. Post-training quantization can minimally decrease model performance.

B is not correct because tuning the whole model on the custom dataset only will cause a drop in offline performance.

C is not correct because tuning the whole model on the custom dataset only will cause a drop in offline performance. Also, pruning helps in compressing model size, but it is expected to provide less latency improvements than quantization.

D is not correct because tuning the whole model on the custom dataset only will cause a drop in offline performance. Also, clustering helps in compressing model size, but it does not reduce latency.

https://cloud.google.com/architecture/best-practices-for-ml-performance-cost

https://www.tensorflow.org/lite/performance/model_optimization

https://www.tensorflow.org/tutorials/images/transfer_learning

https://cloud.google.com/vertex-ai/generative-ai/docs/models/distill-text-models

37
Q

You have a dataset that is split into training, validation, and test sets. All the sets have similar distributions. You have sub-selected the most relevant features and trained a neural network. TensorBoard plots show the training loss oscillating around 0.9, with the validation loss higher than the training loss by 0.3. You want to update the training regime to maximize the convergence of both losses and reduce overfitting. What should you do?

A. Decrease the learning rate to fix the validation loss, and increase the number of training epochs to improve the convergence of both losses.

B. Decrease the learning rate to fix the validation loss, and increase the number and dimension of the layers in the network to improve the convergence of both losses.

C. Introduce L1 regularization to fix the validation loss, and increase the learning rate and the number of training epochs to improve the convergence of both losses.

D. Introduce L2 regularization to fix the validation loss.

A

A is not correct because changing the learning rate does not reduce overfitting. Increasing the number of training epochs is not expected to improve the losses significantly.

B is not correct because changing the learning rate does not reduce overfitting.

C is not correct because increasing the number of training epochs is not expected to improve the losses significantly, and increasing the learning rate could also make the model training unstable. L1 regularization could be used to stabilize the learning, but it is not expected to be particularly helpful because only the most relevant features have been used for training.

D is correct because L2 regularization prevents overfitting. Increasing the model’s complexity boosts the predictive ability of the model, which is expected to optimize loss convergence when underfitting.

https://developers.google.com/machine-learning/testing-debugging/common/overview

https://developers.google.com/machine-learning/crash-course/regularization-for-simplicity/l2-regularization

https://developers.google.com/machine-learning/crash-course/regularization-for-sparsity/l1-regularization

https://cloud.google.com/architecture/guidelines-for-developing-high-quality-ml-solutions#guidelines_for_model_quality

https://www.tensorflow.org/tutorials/keras/overfit_and_underfit

https://www.tensorflow.org/tensorboard/get_started

https://cloud.google.com/architecture/guidelines-for-developing-high-quality-ml-solutions#guidelines_for_model_quality

38
Q

You recently used Vertex AI Prediction to deploy a custom-trained model in production. The automated re-training pipeline made available a new model version that passed all unit and infrastructure tests. You want to define a rollout strategy for the new model version that guarantees an optimal user experience with zero downtime. What should you do?

A. Release the new model version in the same Vertex AI endpoint. Use traffic splitting in Vertex AI Prediction to route a small random subset of requests to the new version and, if the new version is successful, gradually route the remaining traffic to it.

B. Release the new model version in a new Vertex AI endpoint. Update the application to send all requests to both Vertex AI endpoints, and log the predictions from the new endpoint. If the new version is successful, route all traffic to the new application.

C. Deploy the current model version with an Istio resource in Google Kubernetes Engine, and route production traffic to it. Deploy the new model version, and use Istio to route a small random subset of traffic to it. If the new version is successful, gradually route the remaining traffic to it.

D. Install Seldon Core and deploy an Istio resource in Google Kubernetes Engine. Deploy the current model version and the new model version using the multi-armed bandit algorithm in Seldon to dynamically route requests between the two versions before eventually routing all traffic over to the best-performing version.

A

A is not correct because canary deployments may affect user experience, even if on a small subset of users.

B is correct because shadow deployments minimize the risk of affecting user experience while ensuring zero downtime.

C is not correct because canary deployments may affect user experience, even if on a small subset of users. This approach is a less managed alternative to response A and could cause downtime when moving between services.

D is not correct because the multi-armed bandit approach may affect user experience, even if on a small subset of users. This approach could cause downtime when moving between services.

https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning#data_and_model_validation

https://cloud.google.com/architecture/implementing-deployment-and-testing-strategies-on-gke

https://cloud.google.com/architecture/application-deployment-and-testing-strategies#choosing_the_right_strategy

https://cloud.google.com/vertex-ai/docs/general/deployment

https://docs.seldon.io/projects/seldon-core/en/latest/analytics/routers.html

39
Q

You work as an analyst at a large banking firm. You are developing a robust, scalable ML pipeline to train several regression and classification models. Your primary focus for the pipeline is model interpretability. You want to productionize the pipeline as quickly as possible. What should you do?

A. Use Tabular Workflow for Wide & Deep through Vertex AI Pipelines to jointly train wide linear models and deep neural networks.

B. Use Cloud Composer to build the training pipelines for custom deep learning-based models.

C. Use Google Kubernetes Engine to build a custom training pipeline for XGBoost-based models.

D. Use Tabular Workflow for TabNet through Vertex AI Pipelines to train attention-based models.

A

A is not correct because though Tabular Workflows for Wide & Deep is capable of handling classification and regression pipelines, it’s optimized for memorization and generalization, and in general deep learning-based models are not preferred for interpretability.

B is not correct because Cloud Composer is not the right tool to build an ML pipeline quickly, and in general deep learning-based models are not preferred for interpretability.

C is not correct because building a pipeline on Google Kubernetes Engine would take a long time.

D is correct because TabNet uses sequential attention that promotes model interpretability and Tabular Workflows is a set of integrated, fully managed, and scalable pipelines for end-to-end ML with tabular data for regression and classification.

https://cloud.google.com/vertex-ai/docs/tabular-data/tabular-workflows/overview#cr-tabnet

40
Q

You are developing a custom image classification model in Python. You plan to run your training application on Vertex AI. Your input dataset contains several hundred thousand small images. You need to determine how to store and access the images for training. You want to maximize data throughput and minimize training time while reducing the amount of additional code. What should you do?

A. Store image files in Cloud Storage, and access them directly.

B. Store image files in Cloud Storage, and access them by using serialized records.

C. Store image files in Cloud Filestore, and access them by using serialized records.

D. Store image files in Cloud Filestore, and access them directly by using an NFS mount point.

A

A is not correct because Cloud Storage is not optimized for accessing lots of small files, as there is overhead in establishing the connections to retrieve each file.

B is not correct because although accessing a large archive via serialized records (TFRecords, WebDatasets) is faster than small files, it’s still slower than using Filestore.

C is correct because Filestore is faster than Cloud Storage for accessing files, and serialized records are faster for feeding training pipelines than individual files.

D is not correct because although Filestore is faster than Cloud Storage for accessing files, serialized records are still faster than individual file I/O.

https://github.com/webdataset/webdataset

https://cloud.google.com/blog/products/ai-machine-learning/efficient-pytorch-training-with-vertex-ai

https://cloud.google.com/blog/topics/developers-practitioners/scaling-deep-learning-workloads-pytorch-xla-and-cloud-tpu-vm

https://cloud.google.com/blog/topics/developers-practitioners/reading-and-storing-data-custom-model-training-vertex-ai

41
Q

Your company manages an ecommerce website. You developed an ML model that recommends additional products to users in near real time based on items currently in the user’s cart. The workflow will include the following processes:

  1. The website will send a Pub/Sub message with the relevant data, and then receive a message with the prediction from Pub/Sub.
  2. Predictions will be stored in BigQuery.
  3. The model will be stored in a Cloud Storage bucket and will be updated frequently.

You want to minimize prediction latency and the effort required to update the model. How should you reconfigure the architecture?

A. Write a Cloud Function that loads the model into memory for prediction. Configure the function to be triggered when messages are sent to Pub/Sub.

B. Expose the model as a Vertex AI endpoint. Write a custom DoFn in a Dataflow job that calls the endpoint for prediction.

C. Use the RunInference API with WatchFilePattern in a Dataflow job that wraps around the model and serves predictions.

D. Create a pipeline in Vertex AI Pipelines that performs preprocessing, prediction, and postprocessing. Configure the pipeline to be triggered by a Cloud Function when messages are sent to Pub/Sub.

A

A is not correct because Cloud Functions will run into limitations based on request rate and model size.

B is not correct because exposing the model as an endpoint adds to the total latency.

C is correct because the RunInference API with a locally loaded model minimizes the prediction latency and makes model updates seamless.

D is not correct because provisioning Vertex AI Pipelines adds to the total latency.

https://cloud.google.com/dataflow/docs/notebooks/run_custom_inference

https://cloud.google.com/functions/docs/tutorials/pubsub

https://cloud.google.com/vertex-ai/docs/pipelines/trigger-pubsub

https://cloud.google.com/functions/quotas

https://cloud.google.com/architecture/minimizing-predictive-serving-latency-in-machine-learning

42
Q

Your organization wants to make its internal shuttle service route more efficient. The shuttles currently stop at all pick-up points across the city every 30 minutes between 7 am and 10 am. The development team has already built an application on Google Kubernetes Engine that requires users to confirm their presence and shuttle station one day in advance. What approach should you take?

A. 1. Build a tree-based regression model that predicts how many passengers will be picked up at each shuttle station. 2. Dispatch an appropriately sized shuttle and provide the map with the required stops based on the prediction.

B. 1. Build a tree-based classification model that predicts whether the shuttle should pick up passengers at each shuttle station. 2. Dispatch an available shuttle and provide the map with the required stops based on the prediction.

C. 1. Define the optimal route as the shortest route that passes by all shuttle stations with confirmed attendance at the given time under capacity constraints. 2. Dispatch an appropriately sized shuttle and indicate the required stops on the map.

D. 1. Build a reinforcement learning model with tree-based classification models that predict the presence of passengers at shuttle stops as agents and a reward function around a distance-based metric. 2. Dispatch an appropriately sized shuttle and provide the map with the required stops based on the simulated outcome.

A

A is incorrect as shuttle stations would already be available and there is no need to predict those using an ML algorithm.

B is incorrect as shuttle stations would already be available and there is no need to predict those using an ML algorithm.

C is correct as all the shuttle stations which are required to be attended would be available 1 day prior from the application which is already built, hence optimal path can be determined and appropriate shuttle size can be decided and sent accordingly.

D is incorrect, this method can be used if any application isn’t already available, now we go with option C.

43
Q

You were asked to investigate failures of a production line component based on sensor readings. After receiving the dataset, you discover that less than 1% of the readings are positive examples representing failure incidents. You have tried to train several classification models, but none of them converge. How should you resolve the class imbalance problem?

A. Use the class distribution to generate 10% positive examples.

B. Use a convolutional neural network with max-pooling and softmax activation.

C. Downsample the data with upweighting to create a sample with 10% positive examples.

D. Remove negative examples until the numbers of positive and negative examples are equal.

A

A is incorrect, this might help, but 1% is very less data to effectively use upsampling techniques like SMOTE, etc. And C is a better solution.

B is incorrect as using CNN with max pooling will compensate for overfitted problems but wouldn’t resolve data imbalance.

C is correct as downsampling while adding more weight to downsampled data during calculating loss is used to boost the prediction score of downsampled class while training and model will converge faster. If only downsampling is done then the prediction scores for downsampled classes would be low and training would take more time to converge. (refer link)

D is incorrect as it would cause loss of data.

Links:

Handling unbalanced datasets:
https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data
https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data#downsampling-and-upweighting

44
Q

You want to rebuild your ML pipeline for structured data on Google Cloud. You are using PySpark to conduct data transformations at scale, but your pipelines are taking over 12 hours to run. To speed up development and pipeline run time, you want to use a serverless tool and SQL syntax. You have already moved your raw data into Cloud Storage. How should you build the pipeline on Google Cloud while meeting the speed and processing requirements?

A. Use Data Fusion’s GUI to build the transformation pipelines, and then write the data into BigQuery.

B. Convert your PySpark into SparkSQL queries to transform the data and then run your pipeline on Dataproc to write the data into BigQuery.

C. Ingest your data into Cloud SQL, convert your PySpark commands into SQL queries to transform the data, and then use federated queries from BigQuery for machine learning.

D. Ingest your data into BigQuery using BigQuery Load, convert your PySpark commands into BigQuery SQL queries to transform the data, and then write the transformations to a new table.

A

A is incorrect.

B is (in)correct as DataProc will significantly reduce the job running time for transformations and then BigQuery can be used to create the ML Models.

(yes B is a viable option since u can set up dataproc to be serverless. However D is the right answer since it requires least effort and time. What do you say, as for these questions, you have to choose the answer that requires least effort.)
(It should be D …. Data Fusion is not SQL syntax ….)

C is incorrect, here transformation is done on Cloud SQL, which wouldn’t scale the process.

D is correct. Original: D is incorrect as this process wouldn’t scale the data transformation routine. And, it is always better to transform data during ingestion.

Links:

GCP Doc: https://cloud.google.com/dataproc/docs/tutorials/bigquery-connector-spark-example

Jupyter Notebook (Github): https://github.com/tfayyaz/cloud-dataproc/blob/master/notebooks/python/1.2.%20BigQuery%20Storage%20%26%20Spark%20SQL%20-%20Python.ipynb

45
Q

You manage a team of data scientists who use a cloud-based backend system to submit training jobs. This system has become very difficult to administer, and you want to use a managed service instead. The data scientists you work with use many different frameworks, including Keras, PyTorch, Theano, Scikit-learn, and custom libraries. What should you do?

A. Use the AI Platform custom containers feature to receive training jobs using any framework.

B. Configure Kubeflow to run on Google Kubernetes Engine and receive training jobs through TF Job.

C. Create a library of VM images on Compute Engine and publish these images on a centralized repository.

D. Set up Slurm workload manager to receive jobs that can be scheduled to run on your cloud infrastructure.

A

A is correct as AI-Platform supports running training jobs with custom containers. Hence, all the available code in different frameworks can be easily containerised and they would be ready to run on AI-Platform. AI-Platform is a managed service which supports distributed training, hyper-parameter tuning, monitoring, logging and visualization with certain frameworks.

B is incorrect as Kubeflow isn’t a managed service provided by GCP out of the box. It is a platform to manage/orchestrate the complicated kubernetes ML workflows.

C is incorrect, as Compute Engine (VM) isn’t a managed service and it won’t make any administration work simpler.

D is incorrect, this is more far from a managed service based solution.

Links:

https://cloud.google.com/ai-platform/prediction/docs/use-custom-container

46
Q

You work for an online retail company that is creating a visual search engine. You have set up an end-to-end ML pipeline on Google Cloud to classify whether an image contains your company™s product. Expecting the release of new products in the near future, you configured a retraining functionality in the pipeline so that new data can be fed into your ML models. You also want to use AI Platform™s continuous evaluation service to ensure that the models have high accuracy on your test dataset. What should you do?

A. Keep the original test dataset unchanged even if newer products are incorporated into retraining.

B. Extend your test dataset with images of the newer products when they are introduced to retraining.

C. Replace your test dataset with images of the newer products when they are introduced to retraining.

D. Update your test dataset with images of the newer products when your evaluation metrics drop below a pre-decided threshold.

A

A is incorrect, as it won’t give information of model performance on new data.

B is correct, as the model is being re-trained, i.e. it is being actually trained on previous original train dataset and also on the new dataset, the evaluation must be performed on both original test data and new test data to validate the model performance. As new and new data is available, there would be a slight drift in the data from the original one, hence retraining is done to compensate for that drift. Various retraining strategies can be used, similar strategy which is used for training data is needed to be replicated with test data as well.

C is incorrect, as the model is also trained on the original dataset, only including new data in testing would not give the correct representation of model performance.

D is incorrect, there is no need to only update test data with new images when evaluation accuracy drops below a certain threshold.

Note:

If the model is retrained only using the new data, then too, test data shouldn’t contain only the new data, eventually we can drop older data with a certain % during evaluation.

47
Q

You need to build classification workflows over several structured datasets currently stored in BigQuery. Because you will be performing the classification several times, you want to complete the following steps without writing code: exploratory data analysis, feature selection, model building, training, and hyperparameter tuning and serving. What should you do?

A. Configure AutoML Tables to perform the classification task.

B. Run a BigQuery ML task to perform logistic regression for the classification.

C. Use AI Platform Notebooks to run the classification model with pandas library.

D. Use AI Platform to run the classification model job configured for hyperparameter tuning.

A

A is correct as no coding would be required, exploratory data analysis, feature selection, model building, training, and hyperparameter tuning and serving is supported with AutoML Tables.

B is incorrect, as to run ML classification task on BigQuery SQL commands would be needed

C is incorrect, AI Platform Notebooks are generally used for experimentation involving EDA, training, tuning, but not for serving purposes. And extensive coding would be required to perform these tasks.

D is incorrect, to run a classification job on AI Platform, classification code needs to be written and EDA and feature selection would need to be performed separately before running the training job.

Links:

AutoML Table functionalities: https://cloud.google.com/automl-tables/docs/beginners-guide

48
Q

You work for a public transportation company and need to build a model to estimate delay times for multiple transportation routes. Predictions are served directly to users in an app in real-time. Because different seasons and population increases impact the data relevance, you will retrain the model every month. You want to follow Google-recommended best practices. How should you configure the end-to-end architecture of the predictive model?

A. Configure Kubeflow Pipelines to schedule your multi-step workflow from training to deploying your model.

B. Use a model trained and deployed on BigQuery ML, and trigger retraining with the scheduled query feature in BigQuery.

C. Write a Cloud Functions script that launches training and deploying jobs on AI Platform that is triggered by Cloud Scheduler.

D. Use Cloud Composer to programmatically schedule a Dataflow job that executes the workflow from training to deploying your model.

A

A is correct, Kubeflow can be used to orchestrate end-to-end ML pipelines based on Kubernetes containers. Even AI-Platform clusters can be connected to Kubeflow SDK. This is google’s recommended way to run end-to-end ML Pipeline.

B is incorrect, this is also a viable solution, but ML capabilities of the model are restricted by BigQuery ML. The exact model requirements aren’t mentioned in the question for us to see whether the model can be trained using BigQuery ML, hence this option is discarded.

C is incorrect, as this is a very crude way to implement the end-to-end ML pipeline.

D is incorrect. Cloud Composer is a fully managed workflow orchestration service built on Apache Airflow. It is a recommended way by Google to schedule continuous training jobs. But DataFlow isn’t used to run the training jobs. AI Platform is used for training and deployment.

Note:

All options are feasible here, but we have to select the best option.

Links:

Running ML pipelines on GCP (also refer internal links for Kubeflow):

https://cloud.google.com/ai-platform/pipelines/docs/run-pipeline

Cloud Composer Continuous Training jobs (for reference): https://www.coursera.org/lecture/ml-pipelines-google-cloud/what-is-cloud-composer-CuXTQ

49
Q

You are developing ML models with an AI Platform for image segmentation on CT scans. You frequently update your model architectures based on the newest available research papers and have to rerun training on the same dataset to benchmark their performance. You want to minimize computation costs and manual intervention while having version control for your code. What should you do?

A. Use Cloud Functions to identify changes to your code in Cloud Storage and trigger a retraining job.

B. Use the gcloud command-line tool to submit training jobs on the AI Platform when you update your code.

C. Use Cloud Build linked with Cloud Source Repositories to trigger retraining when new code is pushed to the repository.

D. Create an automated workflow in Cloud Composer that runs daily and looks for changes in code in Cloud Storage using a sensor.

A

A is incorrect, using Cloud function to detect changes in code stored in Google cloud storage would lead to trigger of training job at every change, it’s better to trigger training job manually only when required.

B is correct, manual intervention would be very less to run the job on AI-Platform, as you need to run just one command to submit the training job. You would be submitting the training job only when you feel the model code is ready for retraining purposes. And AI-Platform only charges you for the consumed ml-units thus minimizing the cost.

(B is definitely wrong because it will require manual intervention. Question specifically states the objective of minimal manual intervention. C is the way to go.)

C is incorrect, as build pipeline would be triggered for each commit in Source repository, whether you want to run the build(here training) or not. It is great for Continuous deployment pipelines where application availability is priority, but not in case of ML model training.

D is incorrect, again this isn’t a viable solution as checking changes in code daily may trigger non-required training jobs.

Note:

Tip: AI-Platform is generally recommended by GCP for custom training workflows.

Links:

AI Platform Submit Training jobs: https://cloud.google.com/ai-platform/training/docs/training-jobs

50
Q

Your team needs to build a model that predicts whether images contain a driver’s license, passport, or credit card. The data engineering team already built the pipeline and generated a dataset composed of 10,000 images with driver’s licenses, 1,000 images with passports, and 1,000 images with credit cards. You now have to train a model with the following label map: [‘drivers_license’, ‘passport’, ‘credit_card’]. Which loss function should you use?

A. Categorical hinge

B. Binary cross-entropy

C. Categorical cross-entropy

D. Sparse categorical cross-entropy

A

A is incorrect, Categorical Hinge loss is used in problems such as Question-Answering problems in ML when we have a difference loss function only above a certain threshold (refer link for ‘How to use Hinge loss’).

B is incorrect, Binary cross-entropy is used either when there is classification amongst only two classes or when the problem is a multi-label classification problem (with multiple correct outputs). In this case the output classes are one-hot encoded and the output activation function of sigmoid is used.

C is incorrect, Categorical cross-entropy is used in multiple label classification when output classes are one-hot encoded and there is only one correct label. In this case output activation function is softmax.

D is correct, Sparse categorical cross-entropy is used in multiple label classification when output classes are label encoded and there is only one correct label. In this case output activation function is softmax. (Such type of problem is mentioned in the given question)

Links:

How to use hinge loss:

https://www.machinecurve.com/index.php/2019/10/17/how-to-use-categorical-multiclass-hinge-with-keras/

Choosing loss function: https://stats.stackexchange.com/questions/326065/cross-entropy-vs-sparse-cross-entropy-when-to-use-one-over-the-other

51
Q

You are designing an ML recommendation model for shoppers on your company’s e-commerce website. You will use Recommendations AI to build, test, and deploy your system. How should you develop recommendations that increase revenue while following best practices?

A. Use the ‘Other Products You May Like’ recommendation type to increase the click-through rate.

B. Use the ‘Frequently Bought Together’ recommendation type to increase the shopping cart size for each order.

C. Import your user events and then your product catalog to make sure you have the highest quality event stream.

D. Because it will take time to collect and record product data, use placeholder values for the product catalog to test the viability of the model.

A

A is incorrect, Option C would yield better results by importing user events. This would only recommend Other Products as it is the type mentioned.

B is incorrect, Option C would yield better results by importing user events. This would only recommend Frequently bought together Products as it is the type mentioned.

C is correct as Google’s recommended way to use Recommendation AI to create the highest quality event stream by importing your user events and product catalogs.

D is incorrect, products can only be recommended by users’ behaviour.

Links:

Recommendation AI(Refer How It works diagram): https://cloud.google.com/recommendations

52
Q

You are designing architecture with a serverless ML system to enrich customer support tickets with informative metadata before they are routed to a support agent. You need a set of models to predict ticket priority, predict ticket resolution time, and perform sentiment analysis to help agents make strategic decisions when they process support requests. Tickets are not expected to have any domain-specific terms or jargon. The proposed architecture has the following flow:

https://miro.medium.com/v2/resize:fit:1046/format:webp/1*vB7VzgQh4w0xSBH7R98wwg.png
https://medium.com/@gcpguru/google-google-cloud-professional-machine-learning-engineer-practice-questions-part-1-3ee4a2b3f0a4

Which endpoints should the Enrichment Cloud Functions call?

A. 1 = AI Platform, 2 = AI Platform, 3 = AutoML Vision

B. 1 = AI Platform, 2 = AI Platform, 3 = AutoML Natural Language

C. 1 = AI Platform, 2 = AI Platform, 3 = Cloud Natural Language API

D. 1 = Cloud Natural Language API, 2 = AI Platform, 3 = Cloud Vision API

A

A is incorrect, as sentiment analysis is not a computer vision problem, it’s an NLP problem. AutoML Vision is used to train computer vision models for Image classification or Object detection on our data.

B is incorrect, AutoML NLP is used to train the text-classification (here sentiment analysis) model on our own dataset without the need of writing the code for optimal model architecture. But here since no specific jargon is present Cloud NLP API would suffice.

C is correct, Custom models deployed on AI-Platform can be used for Resolution time prediction and Ticket priority prediction. Cloud Natural language API is an NLP API provided by Google out of the box for powerful Text Analysis. Since Tickets doesn’t have any jargons then pretrained API can be used for sentiment analysis.

D is incorrect, as sentiment analysis is not a computer vision problem, it’s an NLP problem. Cloud Vision API is a powerful visual analytics API by Google for Image analysis.

Links:

AutoML NLP features: https://cloud.google.com/natural-language/automl/docs/features

Cloud NLP API (refer features): https://cloud.google.com/natural-language/

53
Q

You have trained a deep neural network model on Google Cloud. The model has a low loss on the training data but is performing worse on the validation data. You want the model to be resilient to overfitting. Which strategy should you use when retraining the model?

A. Apply a dropout parameter of 0.2 and decrease the learning rate by a factor of 10.

B. Apply an L2 regularization parameter of 0.4 and decrease the learning rate by a factor of 10.

C. Run a hyperparameter tuning job on the AI Platform to optimize for the L2 regularization and dropout parameters.

D. Run a hyperparameter tuning job on the AI Platform to optimize for the learning rate and increase the number of neurons by a factor of 2.

A

A is incorrect, insufficient data to decide upon the parameter values.

B is incorrect, insufficient data to decide upon the parameter values.

C is correct, L2 regularization and Dropout are used to reduce overfitting in the neural network.

When we feel that model is overfitting due to less amount of training data, we go for L2 regularization and when we feel the overfitting is due to model complexity we go for dropout in neural networks. (In case of excessive features we go for L1 in traditional algorithms). Since the overfitting reason isn’t mentioned, we would run a hyper-parameter tuning job on AI-Platform to find the appropriate parameters.

D is incorrect, increasing the number of neurons would worsen overfitting, since it would increase the model complexity. (Dropout is used to reduce the model complexity and make model weights more robust, this is a crude explanation, refer below link for details)

Links:

Statquest L2 Regularization: https://www.youtube.com/watch?v=Q81RR3yKn30

Statquest L1 Regularization: https://www.youtube.com/watch?v=NGf0voTMlcs

Dropout: https://www.youtube.com/watch?v=D8PJAL-MZv8

54
Q

You built and manage a production system that is responsible for predicting sales numbers. Model accuracy is crucial because the production model is required to keep up with market changes. Since being deployed to production, the model hasn’t changed; however, the accuracy of the model has steadily deteriorated.
What issue is most likely causing the steady decline in model accuracy?

A. Poor data quality

B. Lack of model retraining

C. Too few layers in the model for capturing information

D. Incorrect data split ratio during model training, evaluation, validation, and test

A

A is incorrect, as poor data quality of the original data isn’t the main reason for accuracy deterioration. Retraining is needed, as the model needs to keep up with market changes.

B is correct, since there is lack of retraining model isn’t keeping up with market changes (We retrain model with the new data, so that keeps with changes in market)

C is incorrect, as this is not an underfitting problem, as accuracy is deteriorating with time, and any information on train accuracy is not mentioned.

D is incorrect, as this wouldn’t explain the deteriorating nature of accuracy with time.

Links:

Why retraining is important: https://neurospace.io/blog/2019/09/why-is-retraining-so-important/

55
Q

You have been asked to develop an input pipeline for an ML training model that processes images from disparate sources at low latency. You discover that your input data does not fit in memory. How should you create a dataset following Google-recommended best practices?

A. Create a tf.data.Dataset.prefetch transformation.

B. Convert the images to tf.Tensor objects, and then run Dataset.from_tensor_slices().

C. Convert the images to tf.Tensor objects, and then run tf.data.Dataset.from_tensors().

D. Convert the images into TFRecords, store the images in Cloud Storage, and then use the tf.data API to read the images for training.

A

A is incorrect, this can prefetching can be also done with tfrecords and is more efficient with it.

B is incorrect, since tfrecords are most recommended.

C is incorrect, since tfrecords are most recommended.

D is correct, Tfrecords with tf.data.TFRecordDataset is the most recommended way. tf.data API is optimized for tfrecords and the prefetch with tfrecords works really fast, thus the next batch of data is always effectively prefetched when the current batch is being processed by the model during training.

Note:

Tfrecords with tf.data is GCP’s recommended way while training a model with a huge dataset using tensorflow.

Links:

Tensorflow official Doc: https://www.tensorflow.org/api_docs/python/tf/data/TFRecordDataset

Kaggle Notebook for Tfrecords: https://www.kaggle.com/ryanholbrook/tfrecords-basics

56
Q

You are an ML engineer at a large grocery retailer with stores in multiple regions. You have been asked to create an inventory prediction model. Your model’s features include region, location, historical demand, and seasonal popularity. You want the algorithm to learn from new inventory data on a daily basis. Which algorithms should you use to build the model?

A. Classification

B. Reinforcement Learning

C. Recurrent Neural Networks (RNN)

D. Convolutional Neural Networks (CNN)

A

Note:

From the question it can be interpreted as a time series problem as the terms like historical demand and seasonal popularity are used.

A is incorrect, This option is very generic

B is incorrect, Reinforcement Learning -> Game AI, Industrial Automation, many more (refer link)

C is correct, RNN -> Time series, NLP problems (sequential data)

D is incorrect, CNN → Image data, 1D CNN can also be used in conjunction with RNN sometimes to deal with overfitting

Links:

RL: https://www.kdnuggets.com/2018/03/5-things-reinforcement-learning.html, https://analyticsindiamag.com/top-10-free-resources-to-learn-reinforcement-learning/

57
Q

You are building a real-time prediction engine that streams files that may contain Personally Identifiable Information (PII) to Google Cloud. You want to use the Cloud Data Loss Prevention (DLP) API to scan the files. How should you ensure that the PII is not accessible by unauthorized individuals?

A. Stream all files to Google Cloud, and then write the data to BigQuery. Periodically conduct a bulk scan of the table using the DLP API.

B. Stream all files to Google Cloud and write batches of the data to BigQuery. While the data is being written to BigQuery, conduct a bulk scan of the data using the DLP API.

C. Create two buckets of data: Sensitive and Non-sensitive. Write all data to the Non-sensitive bucket. Periodically conduct a bulk scan of that bucket using the DLP API, and move the sensitive data to the Sensitive bucket.

D. Create three buckets of data: Quarantine, Sensitive, and Non-sensitive. Write all data to the Quarantine bucket. Periodically conduct a bulk scan of that bucket using the DLP API, and move the data to either the Sensitive or Non-Sensitive bucket.

A

A is incorrect, the method is correct but it doesn’t satisfy the real-time requirement of the prediction engine.

B is correct, since the application is real-time, we scan the bulk of data in BigQuery while other data has been written in it.

C is incorrect, as storing data first in a sensitive bucket would actually leak the PII to unauthorised users.

D is incorrect, the method is correct, but isn’t real-time.

Links:

DLP with GCS: https://cloud.google.com/architecture/automating-classification-of-data-uploaded-to-cloud-storage

DLP with BQ: https://cloud.google.com/bigquery/docs/scan-with-dlp

58
Q

You work for a large hotel chain and have been asked to assist the marketing team in gathering predictions for a targeted marketing strategy. You need to make predictions about user lifetime value (LTV) over the next 20 days so that marketing can be adjusted accordingly. The customer dataset is in BigQuery, and you are preparing the tabular data for training with AutoML Tables. This data has a time signal that is spread across multiple columns. How should you ensure that AutoML fits the best model to your data?

A. Manually combine all columns that contain a time signal into an array. AIlow AutoML to interpret this array appropriately. Choose an automatic data split across the training, validation, and testing sets.

B. Submit the data for training without performing any manual transformations. AIlow AutoML to handle the appropriate transformations. Choose an automatic data split across the training, validation, and testing sets.

C. Submit the data for training without performing any manual transformations and indicate an appropriate column as the Time column. AIlow AutoML to split your data based on the time signal provided, and reserve the more recent data for the validation and testing sets.

D. Submit the data for training without performing any manual transformations. Use the columns that have a time signal to manually split your data. Ensure that the data in your validation set is from 30 days after the data in your training set and that the data in your testing sets from 30 days after your validation set.

A

A is incorrect, there is no need to manually combine data as array, and don’t use automatic data splits, as AutoML would consider all rows independently and split the data randomly.

B is incorrect, as AutoML would consider all rows independently in case of automatic data splits and split the data randomly with appropriate ratios.

C is incorrect, AutoML Tables uses the earliest 80% of the rows for training, the next 10% of rows for validation, and the latest 10% of rows for testing by default when Time column is specified, but this split wouldn’t satisfy the days criteria given in question. (Eg: Let’s say the 10% validation data doesn’t have data even for 20 days then how can that be validated)

D is correct, as this would satisfy the days criteria mentioned in the question. 30 days is more than 20 days, and the prediction model can be used on a validation dataset to validate the results for the next 20 days.

Links:

AutoML Preparing data (refer section ‘The Time Column’): https://cloud.google.com/automl-tables/docs/prepare

59
Q

You have written unit tests for a Kubeflow Pipeline that require custom libraries. You want to automate the execution of unit tests with each new push to your development branch in Cloud Source Repositories. What should you do?

A. Write a script that sequentially performs the push to your development branch and executes the unit tests on Cloud Run.

B. Using Cloud Build set an automated trigger to execute the unit tests when changes are pushed to your development branch.

C. Set up a Cloud Logging sink to a Pub/Sub topic that captures interactions with Cloud Source Repositories. Configure a Pub/Sub trigger for Cloud Run, and execute the unit tests on Cloud Run.

D. Set up a Cloud Logging sink to a Pub/Sub topic that captures interactions with Cloud Source Repositories. Execute the unit tests using a Cloud Function that is triggered when messages are sent to the Pub/Sub topic.

A

A is incorrect, this limits the rich Git CLI functionalities as we would be using script to push code.

B is incorrect, cloud build not recommended for unit testing.

C is correct, as cloud run can use custom containers endpoints to run in a serverless way. Pub/Sub notifications can be created from the update in the source repository, and that event topic can be used to trigger the cloud function.

D is incorrect, as cloud functions can be used for unitesting with custom libraries as local packages but only for the same language.

Links:

Unit testing with pub/sub and cloud functions: https://cloud.google.com/functions/docs/samples/functions-pubsub-unit-test#functions_pubsub_unit_test-python

Cloud Source repository with Pub/Sub: https://cloud.google.com/source-repositories/docs/pubsub-notifications

Cloud Run v/s Cloud Function: https://medium.com/google-cloud/cloud-run-and-cloud-function-what-i-use-and-why-12bb5d3798e1

Packaging custom libraries as local packages in cloud function (same language): https://cloud.google.com/functions/docs/writing/specifying-dependencies-python

60
Q

You are training an LSTM-based model on AI Platform to summarize text using the following job submission script:

gcloud ai-platform jobs submit training $JOB_NAME \
— package-path $TRAINER_PACKAGE_PATH \
— module-name $MAIN_TRAINER_MODULE \
— job-dir $JOB_DIR \
— region $REGION \
— scale-tier basic \
— \
— epochs 20 \
— batch_size=32 \
— learning_rate=0.001 \

You want to ensure that training time is minimized without significantly compromising the accuracy of your model. What should you do?

A. Modify the ‘epochs’ parameter.

B. Modify the ‘scale-tier’ parameter.

C. Modify the ‘batch size’ parameter.

D. Modify the ‘learning rate’ parameter.

A

A is incorrect, less training iteration will affect model performance.

B is correct, cost is not a concern as it is not mentioned in the question, the scale tier can be upgraded to significantly minimize the training time.

C is incorrect, wouldn’t affect training time, but would affect model performance.

D is incorrect, the model might converge faster with higher learning rate, but this would affect the training routine and might cause exploding gradients.

Links:

Running Training job on AI Platform: https://cloud.google.com/ai-platform/training/docs/training-jobs

Scale Tier AI Platform: https://cloud.google.com/ai-platform/training/docs/machine-types

61
Q

You are going to train a DNN regression model with Keras APJs using this code:

model - tf.keras.Sequential() model.add(tf.keras.layers.Oense(
256,
use_bias-True,
activation-relu’,
kernel_initializer-None,
kernel_regularizer-None,
input_shape-(500,)))
model.add(tf.keras.layers.Oropout(rate-0.25))
model.add(tf.keras.layers.Oense(
128, use_bias-True,
activation-
relu’,
kernel_initializer-‘uniform’,
kernel_regularizer-‘12’))
model.add(tf.keras.layers.Oropout(rate-0.25))
model.add(tf.keras.layers.Oense(
2, use_bias-False,
activation-softriax’))
model.cornpile(loss-
mse’)

How many trainable weights does your model have? (The arithmetic below is correct.)

A. 501256+257128+2 = 161154
B. 500256+256128+1282 = 161024
C. 501
256+257128+1282 = 161408
D. 5002560(?)25+2561280(?)25+128*2 = 4044

A

B: Dense layers with 100 % trainable weigts, the dropout rate at 0.25 will randomly drop 25 % for the regularization’s sake - still training for 100 % of the weights.

Correct answer is C. Do not forget about bias term which is also trainable parameter. C is correct. 2nd Layer with use_bias = True

D is incorrect, “The Dropout Layer randomly disables neurons during training. They still are present in your model and therefore aren´t discounted from the number of parameters in your model summary.” , so D is wrong , C and A takes care of the bias , but C is correct

62
Q

You have deployed multiple versions of an image classification model on AI Platform. You want to monitor the performance of the model versions over time. How should you perform this comparison?

A. Compare the loss performance for each model on a held-out dataset.

B. Compare the loss performance for each model on the validation data.

C. Compare the receiver operating characteristic (ROC) curve for each model using the What-If Tool.

D. Compare the mean average precision across the models using the Continuous Evaluation feature.

A

D as it it google option, but now deprecated: https://cloud.google.com/ai-platform/prediction/docs/continuous-evaluation

The answer is A. I am not sure why people choose B vs A as you may overfit your validation set. And you are using your held-out set really rare == no option to overfit.

63
Q

You trained a text classification model. You have the following SignatureDefs:

signature_def[‘serving_default’]:
The given SavedModel SignatureDef contains the following input (s):
inputs[‘text’] tensor_info:
dtype: DT_STRING
shape: (-1, 2)
name: serving_default_text: 0
The given SavedModel SignatureDef contains the following output (s):
outputs [‘Softmax’] tensor_info:
dtype: DT_FLOAT
shape: (-1, 2)
name: StatefulPartitionedCall:0
Method name is: tensorflow/serving/predict
————

You started a TensorFlow-serving component server and tried to send an HTTP request to get a prediction using: headers = {“content-type”: “application/json”} json_response = requests.post(‘http: //localhost:8501/v1/models/text_model:predict’, data=data, headers=headers)
What is the correct way to write the predict request?

A. data = json.dumps({“signature_name”: “seving_default”, “instances” [[‘ab’, ‘bc’, ‘cd’]]})

B. data = json.dumps({“signature_name”: “serving_default”, “instances” [[‘a’, ‘b’, ‘c’, ‘d’, ‘e’, ‘f’]]})

C. data = json.dumps({“signature_name”: “serving_default”, “instances” [[‘a’, ‘b’, ‘c’], [‘d’, ‘e’, ‘f’]]})

D. data = json.dumps({“signature_name”: “serving_default”, “instances” [[‘a’, ‘b’], [‘c’, ‘d’], [‘e’, ‘f’]]})

A

Most likely D. A negative number in the shape enables auto expand (https://stackoverflow.com/questions/37956197/what-is-the-negative-index-in-shape-arrays-used-for-tensorflow).

Then the first number -1 out of the shape (-1, 2) speaks the number of 1 dimensional arrays within the tensor (and it can autoexpand) while the second numer (2) sets the number of elements in the inner array at 2. Hence D.

Having “shape=[-1,2]”, the input can have as many rows as we want, but each row needs to be of 2 elements. The only option satisfying this requirement is D.

D: (-1, 2) represents a vector with any number of rows but only 2 columns.

63
Q

You are an ML engineer at a global shoe store. You manage the ML models for the company’s website. You are asked to build a model that will recommend new products to the user based on their purchase behavior and similarity with other users. What should you do?

A. Build a classification model
B. Build a knowledge-based filtering model
C. Build a collaborative-based filtering model
D. Build a regression model using the features as predictors

A

C. Collaborative filtering is about user similarity and product recommendations. Other models won’t work

Classification models (Option A) and regression models (Option D) are generally used for different types of predictive modeling tasks, not specifically for recommendations. A knowledge-based filtering model (Option B), while useful in recommendation systems, relies more on explicit knowledge about users and items, rather than on user interaction patterns and similarities, which seems to be the focus in this scenario.

64
Q

Your organization’s call center has asked you to develop a model that analyzes customer sentiments in each call. The call center receives over one million calls daily, and data is stored in Cloud Storage. The data collected must not leave the region in which the call originated, and no Personally Identifiable Information (PII) can be stored or analyzed. The data science team has a third-party tool for visualization and access which requires a SQL ANSI-2011 compliant interface. You need to select components for data processing and for analytics. How should the data pipeline be designed?

https://www.examtopics.com/assets/media/exam-media/03841/0001400001.png

A. 1= Dataflow, 2= BigQuery
B. 1 = Pub/Sub, 2= Datastore
C. 1 = Dataflow, 2 = Cloud SQL
D. 1 = Cloud Function, 2= Cloud SQL

A

A is correct
Dataflow - Unified stream and batch data processing that’s serverless, fast, and cost-effective
BigQuery - Good for analytics and dashboards

A - because it has BigQuery.
Almost never would you see an answer that prefers CloudSQL over BQ

You need to do analytics, so the answer needs to contain BigQuery and only option A does.
Moreover, BigQuery is fine with SQL and Dataflow is the right tool for the processing pipeline.

we need a dataflow to process data from cloud storage and data is unstructured and if we want to perform analysis on unstructured with SQL interface BIgQuery is the only option

65
Q

You are responsible for building a unified analytics environment across a variety of on-premises data marts. Your company is experiencing data quality and security challenges when integrating data across the servers, caused by the use of a wide range of disconnected tools and temporary solutions. You need a fully managed, cloud-native data integration service that will lower the total cost of work and reduce repetitive work. Some members on your team prefer a codeless interface for building Extract, Transform, Load (ETL) process. Which service should you use?

A. Dataflow
B. Dataprep
C. Apache Flink
D. Cloud Data Fusion

A

D. correct.
Reference: https://cloud.google.com/data-fusion

66
Q

You work for a social media company. You need to detect whether posted images contain cars. Each training example is a member of exactly one class. You have trained an object detection neural network and deployed the model version to AI Platform Prediction for evaluation. Before deployment, you created an evaluation job and attached it to the AI Platform Prediction model version. You notice that the precision is lower than your business requirements allow. How should you adjust the model’s final layer softmax threshold to increase precision?

A. Increase the recall.
B. Decrease the recall.
C. Increase the number of false positives.
D. Decrease the number of false negatives.

A

Precision = TruePositives / (TruePositives + FalsePositives)
Recall = TruePositives / (TruePositives + FalseNegatives)
A. Increase recall -> will decrease precision
B. Decrease recall -> will increase precision
C. Increase the false positives -> will decrease precision
D. Decrease the false negatives -> will increase recall, reduce precision

The correct answer is B.

67
Q

You are an ML engineer at a regulated insurance company. You are asked to develop an insurance approval model that accepts or rejects insurance applications from potential customers. What factors should you consider before building the model?

A. Redaction, reproducibility, and explainability
B. Traceability, reproducibility, and explainability
C. Federated learning, reproducibility, and explainability
D. Differential privacy, federated learning, and explainability

A

B. Traceability, reproducibility, and explainability.

Traceability: This involves maintaining records of the data, decisions, and processes used in the model. This is crucial in regulated industries for audit purposes and to ensure compliance with regulatory standards. It helps in understanding how the model was developed and how it makes decisions.

Reproducibility: Ensuring that the results of the model can be reproduced using the same data and methods is vital for validating the model’s reliability and for future development or debugging.

Explainability: Given the significant impact of the model’s decisions on individuals’ lives, it’s crucial that the model’s decisions can be explained in understandable terms. This is not just a best practice in AI ethics; in many jurisdictions, it’s a legal requirement under regulations that mandate transparency in automated decision-making.

68
Q

You are training a Resnet model on AI Platform using TPUs to visually categorize types of defects in automobile engines. You capture the training profile using the Cloud TPU profiler plugin and observe that it is highly input-bound. You want to reduce the bottleneck and speed up your model training process. Which modifications should you make to the tf.data dataset? (Choose two.)

A. Use the interleave option for reading data.
B. Reduce the value of the repeat parameter.
C. Increase the buffer size for the shuttle option.
D. Set the prefetch option equal to the training batch size.
E. Decrease the batch size argument in your transformation.

A

A and D : https://www.tensorflow.org/guide/data_performance , interleave and prefetch

69
Q

You have trained a model on a dataset that required computationally expensive preprocessing operations. You need to execute the same preprocessing at prediction time. You deployed the model on AI Platform for high-throughput online prediction. Which architecture should you use?

A. Validate the accuracy of the model that you trained on preprocessed data. Create a new model that uses the raw data and is available in real time. Deploy the new model onto AI Platform for online prediction.

B. Send incoming prediction requests to a Pub/Sub topic. Transform the incoming data using a Dataflow job. Submit a prediction request to AI Platform using the transformed data. Write the predictions to an outbound Pub/Sub queue.

C. Stream incoming prediction request data into Cloud Spanner. Create a view to abstract your preprocessing logic. Query the view every second for new records. Submit a prediction request to AI Platform using the transformed data. Write the predictions to an outbound Pub/Sub queue.

D. Send incoming prediction requests to a Pub/Sub topic. Set up a Cloud Function that is triggered when messages are published to the Pub/Sub topic. Implement your preprocessing logic in the Cloud Function. Submit a prediction request to AI Platform using the transformed data. Write the predictions to an outbound Pub/Sub queue.

A

I went with B.

A is completely wrong.
C: 1st cloud spanner is not designed for high throughput, also it is not for preprocessing.
D: cloud function could not be get enough resource to do the high computational transformation.

I think it’s D as B is not a good choice because it requires you to run a Dataflow job for each prediction request. This is inefficient and can lead to latency issues. The issue with B is that DataFlow does not work well with high throughput

70
Q

Your team trained and tested a DNN regression model with good results. Six months after deployment, the model is performing poorly due to a change in the distribution of the input data. How should you address the input differences in production?

A. Create alerts to monitor for skew, and retrain the model.
B. Perform feature selection on the model, and retrain the model with fewer features.
C. Retrain the model, and select an L2 regularization parameter with a hyperparameter tuning service.
D. Perform feature selection on the model, and retrain the model on a monthly basis with fewer features.

A

A

Data values skews: These skews are significant changes in the
statistical properties of data, which means that data patterns are
changing, and you need to trigger a retraining of the model to capture
these changes.
https://developers.google.com/machine-learning/guides/rules-of-ml/#rule_37_measure_trainingserving_skew

71
Q

You need to train a computer vision model that predicts the type of government ID present in a given image using a GPU-powered virtual machine on Compute
Engine. You use the following parameters:
✑ Optimizer: SGD
✑ Image shape = 224ֳ—224
✑ Batch size = 64
✑ Epochs = 10
✑ Verbose =2
During training you encounter the following error: ResourceExhaustedError: Out Of Memory (OOM) when allocating tensor. What should you do?

A. Change the optimizer.
B. Reduce the batch size.
C. Change the learning rate.
D. Reduce the image shape.

A

B

72
Q

You developed an ML model with AI Platform, and you want to move it to production. You serve a few thousand queries per second and are experiencing latency issues. Incoming requests are served by a load balancer that distributes them across multiple Kubeflow CPU-only pods running on Google Kubernetes Engine
(GKE). Your goal is to improve the serving latency without changing the underlying infrastructure. What should you do?

A. Significantly increase the max_batch_size TensorFlow Serving parameter.
B. Switch to the tensorflow-model-server-universal version of TensorFlow Serving.
C. Significantly increase the max_enqueued_batches TensorFlow Serving parameter.
D. Recompile TensorFlow Serving using the source to support CPU-specific optimizations. Instruct GKE to choose an appropriate baseline minimum CPU platform for serving nodes.

A

D is correct since this question is focusing on server performance which development env is higher than production env. It’s already throttling so increase the pressure on them won’t help. Both A and C is essentially doing this. B is a bit mysterious, but we definitely know that D would work.

73
Q

You have a demand forecasting pipeline in production that uses Dataflow to preprocess raw data prior to model training and prediction. During preprocessing, you employ Z-score normalization on data stored in BigQuery and write it back to BigQuery. New training data is added every week. You want to make the process more efficient by minimizing computation time and manual intervention. What should you do?

A. Normalize the data using Google Kubernetes Engine.
B. Translate the normalization algorithm into SQL for use with BigQuery.
C. Use the normalizer_fn argument in TensorFlow’s Feature Column API.
D. Normalize the data with Apache Spark using the Dataproc connector for BigQuery.

A

B. I think. BiqQuery definitely minimizes computational time for normalization. I think it would also minimize manual intervention. For data normalization in dataflow you’d have to pass in values of mean and standard deviation as a side-input. That seems more work than a simple SQL query

74
Q

You need to design a customized deep neural network in Keras that will predict customer purchases based on their purchase history. You want to explore model performance using multiple model architectures, store training data, and be able to compare the evaluation metrics in the same dashboard. What should you do?

A. Create multiple models using AutoML Tables.
B. Automate multiple training runs using Cloud Composer.
C. Run multiple training jobs on AI Platform with similar job names.
D. Create an experiment in Kubeflow Pipelines to organize multiple runs.

A

D - https://www.kubeflow.org/docs/about/use-cases/

The best approach is to create an experiment in Kubeflow Pipelines to organize multiple runs.

Option A is incorrect because AutoML Tables is a managed machine learning service that automates the process of building machine learning models from tabular data. It does not provide the flexibility to customize the model architecture or explore multiple model architectures.

Option B is incorrect because Cloud Composer is a managed workflow orchestration service that can be used to automate machine learning workflows. However, it does not provide the same level of flexibility or scalability as Kubeflow Pipelines.

Option C is incorrect because running multiple training jobs on AI Platform with similar job names will not allow you to easily organize and compare the results.

75
Q

You are developing a Kubeflow pipeline on Google Kubernetes Engine. The first step in the pipeline is to issue a query against BigQuery. You plan to use the results of that query as the input to the next step in your pipeline. You want to achieve this in the easiest way possible. What should you do?

A. Use the BigQuery console to execute your query, and then save the query results into a new BigQuery table.
B. Write a Python script that uses the BigQuery API to execute queries against BigQuery. Execute this script as the first step in your Kubeflow pipeline.
C. Use the Kubeflow Pipelines domain-specific language to create a custom component that uses the Python BigQuery client library to execute queries.
D. Locate the Kubeflow Pipelines repository on GitHub. Find the BigQuery Query Component, copy that component’s URL, and use it to load the component into your pipeline. Use the component to execute queries against BigQuery.

A

D. Kubeflow pipelines have different types of components, ranging from low- to high-level. They have a ComponentStore that allows you to access prebuilt functionality from GitHub.

Not sure what is the reason behind putting A as it is manual and manual steps can not be part of automation. I would say Answer is D as it just require a clone of the component from github. Using a Python and import bigquery component may sounds good too, but ask was what is easiest. It depends how word “easy” is taken by individuals but definitely not A.

Very confused as to why D is the correct answer. To me it seems a) much simpler to just write a couple of lines of python (https://cloud.google.com/bigquery/docs/reference/libraries#client-libraries-install-python) and b) the documentation for the BigQuery reusable component (https://v0-5.kubeflow.org/docs/pipelines/reusable-components/) states that the data is written to Google Cloud Storage, which means we have to write the fetching logic in the next pipeline step, going against the “as simple as possible” requirement. Would be interested to hear why I am wrong.

76
Q
A