Which two activities are involved in ML development? 1. Version control and training operationalization 2. Experimentation and version control check 3. Experimentation and training operationalization 3. Training formalization and training operationalization

3. Experimentation and training operationalization

What is the correct process that data scientists use to develop the models on an experimentation platform? check 1. Problem definition > Data selection > Data exploration > Feature engineering > Model prototyping > Model validation 2. Problem definition > Data selection > Data exploration > Model prototyping > Feature engineering > Model validation 3. Problem definition > Data selection > Data exploration > Model prototyping > Model validation > Feature engineering 4. Problem definition > Data exploration > Data selection > Feature engineering > Model prototyping > Model validation

1. Problem definition > Data selection > Data exploration > Feature engineering > Model prototyping > Model validation

Which of the following is correct for Online serving? 1. Online serving is for high-latency data retrieval of small batches of data for real-time processing. 2. Online serving is for high throughput and serving large volumes of data for offline processing. 3. Online serving is for low throughput and serving large volumes of data for offline processing. 4. Online serving is for low-latency data retrieval of small batches of data for real-time processing.

4. Online serving is for low-latency data retrieval of small batches of data for real-time processing.

Which of the following statements is not a feature of Analytics Hub? 1. You can create and access a curated library of internal and external assets, including unique datasets like Google Trends, backed by the power of BigQuery. 2. Analytics Hub requires batch data pipelines that extract data from databases, store it in flat files, and transmit them to the consumer where they are ingested into another database. 3. There are three roles in Analytics Hub - A Data Publisher, Exchange Administrator, and a Data Subscriber. 4. Analytics Hub efficiently and securely exchanges data analytics assets across organizations to address challenges of data reliability and cost.

2. Analytics Hub requires batch data pipelines that extract data from databases, store it in flat files, and transmit them to the consumer where they are ingested into another database.

What does the Aggregation Values contain in any feature? 1. The min, zeros, and Std.dev values for each features 2. The min, median, and max values for each features 3. The Count, median, and max values for each features 4. The min, median, and Std.dev values for each features

2. The min, median, and max values for each features

Model complexity often refers to the number of features or terms included in a given predictive model. What happens when the complexity of the model increases? 1. All of the options are correct. 2. Model performance on a test set is going to be poor. 3. Model will not figure out general relationships in the data. 4. Model is more likely to overfit.

1. All of the options are correct.

Which of the following is true? 1. Smaller batch sizes require larger learning rates. 2. Larger batch sizes require larger learning rates. 3. Smaller batch sizes require smaller learning rates. 4. Larger batch sizes require smaller learning rates.

4. Larger batch sizes require smaller learning rates.

Which of the following can make a huge difference in model quality? 1. Decreasing the number of epochs. 2. Increasing the learning rate. 3. Increasing the training time. 4. Setting hyperparameters to their optimal values for a given dataset.

4. Setting hyperparameters to their optimal values for a given dataset.

Bayesian optimization takes into account past evaluations when choosing the hyperparameter set to evaluate next. By choosing its parameter combinations in an informed way, it enables itself to focus on those areas of the parameter space that it believes will bring the most promising validation scores. Therefore it _____________________. 1. requires less iterations to get to the optimal set of hyperparameter values. 2. limits the number of times a model needs to be trained for validation. 3. enables itself to focus on those areas of the parameter space that it believes will bring the most promising validation scores. 4. All of the options are correct.

4. All of the options are correct.

Machine Learning in Enterprise Flashcards by Stefan Hoejmose

Which two activities are involved in ML development?

Version control and training operationalization
Experimentation and version control
check
Experimentation and training operationalization
Training formalization and training operationalization

Experimentation and training operationalization

How well did you know this?

Not at all

Perfectly

What is the correct process that data scientists use to develop the models on an experimentation platform?
check

Problem definition > Data selection > Data exploration > Feature engineering > Model prototyping > Model validation
Problem definition > Data selection > Data exploration > Model prototyping > Feature engineering > Model validation
Problem definition > Data selection > Data exploration > Model prototyping > Model validation > Feature engineering
Problem definition > Data exploration > Data selection > Feature engineering > Model prototyping > Model validation

Problem definition > Data selection > Data exploration > Feature engineering > Model prototyping > Model validation

How well did you know this?

Not at all

Perfectly

Which process covers algorithm selection, model training, hyperparameter tuning, and model evaluation in the Experimentation and Prototyping activity?

Model prototyping
Model validation
Feature engineering
Data exploration

Model prototyping

How well did you know this?

Not at all

Perfectly

If the model needs to be repeatedly retrained in the future, an automated training pipeline is also developed. Which task do we use for this?

Training implementation
Training operationalization
Experimentation & prototyping
Training formalization

Training operationalization

How well did you know this?

Not at all

Perfectly

Which of the following is correct for Online serving?

Online serving is for high-latency data retrieval of small batches of data for real-time processing.
Online serving is for high throughput and serving large volumes of data for offline processing.
Online serving is for low throughput and serving large volumes of data for offline processing.
Online serving is for low-latency data retrieval of small batches of data for real-time processing.

Online serving is for low-latency data retrieval of small batches of data for real-time processing.

How well did you know this?

Not at all

Perfectly

Which Data processing option can be used for transforming large unstructured data in Google Cloud?

Dataflow
Beam proc
Apache prep
Hadoop proc

Dataflow

How well did you know this?

Not at all

Perfectly

Which of the following is not a part of Google’s enterprise data management and governance tool?

Data Catalog
Dataplex
Analytics Catalog
Feature Store

Analytics Catalog

How well did you know this?

Not at all

Perfectly

Which of the following statements is not a feature of Analytics Hub?

You can create and access a curated library of internal and external assets, including unique datasets like Google Trends, backed by the power of BigQuery.
Analytics Hub requires batch data pipelines that extract data from databases, store it in flat files, and transmit them to the consumer where they are ingested into another database.
There are three roles in Analytics Hub - A Data Publisher, Exchange Administrator, and a Data Subscriber.
Analytics Hub efficiently and securely exchanges data analytics assets across organizations to address challenges of data reliability and cost.

Analytics Hub requires batch data pipelines that extract data from databases, store it in flat files, and transmit them to the consumer where they are ingested into another database.

How well did you know this?

Not at all

Perfectly

What does the Aggregation Values contain in any feature?

The min, zeros, and Std.dev values for each features
The min, median, and max values for each features
The Count, median, and max values for each features
The min, median, and Std.dev values for each features

The min, median, and max values for each features

How well did you know this?

Not at all

Perfectly

Model complexity often refers to the number of features or terms included in a given predictive model. What happens when the complexity of the model increases?

All of the options are correct.
Model performance on a test set is going to be poor.
Model will not figure out general relationships in the data.
Model is more likely to overfit.

All of the options are correct.

How well did you know this?

Not at all

Perfectly

The learning rate is a configurable hyperparameter used in the training of neural networks that has a small positive value, often in the range between _______

< 0.0 and > 1.00.
> 0.0 and < 1.00.
0.0 and 1.0.
1.0 and 3.0.

0.0 and 1.0.

How well did you know this?

Not at all

Perfectly

Which of the following is true?

Smaller batch sizes require larger learning rates.
Larger batch sizes require larger learning rates.
Smaller batch sizes require smaller learning rates.
Larger batch sizes require smaller learning rates.

Larger batch sizes require smaller learning rates.

How well did you know this?

Not at all

Perfectly

Which of the following can make a huge difference in model quality?

Decreasing the number of epochs.
Increasing the learning rate.
Increasing the training time.
Setting hyperparameters to their optimal values for a given dataset.

Setting hyperparameters to their optimal values for a given dataset.

How well did you know this?

Not at all

Perfectly

Which of the following is a black-box optimization service?

Vertex Vizier
AutoML
Manual Search
Early stopping

Vertex Vizier

How well did you know this?

Not at all

Perfectly

Bayesian optimization takes into account past evaluations when choosing the hyperparameter set to evaluate next. By choosing its parameter combinations in an informed way, it enables itself to focus on those areas of the parameter space that it believes will bring the most promising validation scores. Therefore it _____________________.

requires less iterations to get to the optimal set of hyperparameter values.
limits the number of times a model needs to be trained for validation.
enables itself to focus on those areas of the parameter space that it believes will bring the most promising validation scores.
All of the options are correct.

All of the options are correct.

How well did you know this?

Not at all

Perfectly

Black box optimization algorithms find the best operating parameters for any system whose ______________?

execution time is less.
performance can be measured as a function of adjustable parameters.
iterations to get to the optimal set of hyperparameter values are less.
number of iterations is limited to train a model for validation.

Study These Flashcards

performance can be measured as a function of adjustable parameters.

Which of the following algorithms is useful, if you want to specify a quantity of trials that is greater than the number of points in the feasible space?

Bayesian Optimization
Random Search
Grid Search
Manual Search

Study These Flashcards

Grid Search

What are the features of Vertex AI model monitoring?

Skew in training vs. serving data
Drift in data quality
All of the options are correct.
Feature Attribution and UI visualizations

Study These Flashcards

All of the options are correct.

Which of the following statements is invalid for a data source file in batch prediction?

If the Cloud Storage bucket is in a different project than where you use Vertex AI, you must provide the Storage Object Creator role to the Vertex AI service account in that project.
The first line of the data source CSV file must contain the name of the columns.
You must use a regional BigQuery dataset.
BigQuery data source tables must be no larger than 100 GB.

Study These Flashcards

You must use a regional BigQuery dataset.

Which statements are correct for serving predictions using Pre-built containers?

Pre-built containers provide HTTP prediction servers that you can use to serve prediction using minimal configurations.
Vertex AI provides Docker container images that you run as pre-built containers for serving predictions.
All of the options are correct.
Pre-built containers are organized by Machine learning framework and framework version.

Study These Flashcards

All of the options are correct.

For which, the baseline is the statistical distribution of the feature’s values seen in production in the recent past.

Numerical features
Categorical features
Skew detection
Drift detection

Study These Flashcards

Drift detection

What should be done if the source table is in a different project?

You should provide the BigQuery Data Viewer role to the Vertex AI service account in that project.

2 .You should provide the BigQuery Data Editor role to the Vertex AI service account in that project.

You should provide the BigQuery Data Viewer role to the Vertex AI service account in your project.
You should provide the BigQuery Data Editor role to the Vertex AI service account in your project.

Study These Flashcards

2 .You should provide the BigQuery Data Editor role to the Vertex AI service account in that project.

Which statement is correct regarding the maximum size for a CSV file during batch prediction?

Each data source file must not be larger than 10 GB. You can include multiple files, up to a maximum amount of 100 GB.
The data source file must be no larger than 100 GB.
Each data source file must include multiple files, up to a maximum amount of 50 GB.
The data source file must be no larger than 50 GB. You can not include multiple files.

Study These Flashcards

Each data source file must not be larger than 10 GB. You can include multiple files, up to a maximum amount of 100 GB.

Which package is used to define and interact with pipelines and components?

kfp.dsl package
kfp.components
kfp.compiler
kfp.containers

Study These Flashcards

kfp.dsl package

What can you use to create a pipeline run on Vertex AI Pipelines? 1. Vertex AI python client 2. kfp.v2.compiler.Compiler 3. Pipeline root path 4. Service account

1. Vertex AI python client

What can you use to compile the pipeline? 1. compiler.Compiler 2. kfp.Compiler 3. kfp.v2.compiler 4. kfp.v2.compiler.Compiler

4. kfp.v2.compiler.Compiler

How can you define the pipeline's workflow as a graph? 1. By using different inputs for each component. 2. By using the outputs of a component as an input of another component 3. Use the previous pipeline's output as an input for the current pipeline. 4. By using predictive input for each component.

2. By using the outputs of a component as an input of another component

Machine Learning in Enterprise Flashcards

(27 cards)