Machine Learning in Enterprise Flashcards

1
Q

Which two activities are involved in ML development?

  1. Version control and training operationalization
  2. Experimentation and version control
    check
  3. Experimentation and training operationalization
  4. Training formalization and training operationalization
A
  1. Experimentation and training operationalization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the correct process that data scientists use to develop the models on an experimentation platform?
check

  1. Problem definition > Data selection > Data exploration > Feature engineering > Model prototyping > Model validation
  2. Problem definition > Data selection > Data exploration > Model prototyping > Feature engineering > Model validation
  3. Problem definition > Data selection > Data exploration > Model prototyping > Model validation > Feature engineering
  4. Problem definition > Data exploration > Data selection > Feature engineering > Model prototyping > Model validation
A
  1. Problem definition > Data selection > Data exploration > Feature engineering > Model prototyping > Model validation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Which process covers algorithm selection, model training, hyperparameter tuning, and model evaluation in the Experimentation and Prototyping activity?

  1. Model prototyping
  2. Model validation
  3. Feature engineering
  4. Data exploration
A
  1. Model prototyping
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

If the model needs to be repeatedly retrained in the future, an automated training pipeline is also developed. Which task do we use for this?

  1. Training implementation
  2. Training operationalization
  3. Experimentation & prototyping
  4. Training formalization
A
  1. Training operationalization
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which of the following is correct for Online serving?

  1. Online serving is for high-latency data retrieval of small batches of data for real-time processing.
  2. Online serving is for high throughput and serving large volumes of data for offline processing.
  3. Online serving is for low throughput and serving large volumes of data for offline processing.
  4. Online serving is for low-latency data retrieval of small batches of data for real-time processing.
A
  1. Online serving is for low-latency data retrieval of small batches of data for real-time processing.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which Data processing option can be used for transforming large unstructured data in Google Cloud?

  1. Dataflow
  2. Beam proc
  3. Apache prep
  4. Hadoop proc
A
  1. Dataflow
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Which of the following is not a part of Google’s enterprise data management and governance tool?

  1. Data Catalog
  2. Dataplex
  3. Analytics Catalog
  4. Feature Store
A
  1. Analytics Catalog
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which of the following statements is not a feature of Analytics Hub?

  1. You can create and access a curated library of internal and external assets, including unique datasets like Google Trends, backed by the power of BigQuery.
  2. Analytics Hub requires batch data pipelines that extract data from databases, store it in flat files, and transmit them to the consumer where they are ingested into another database.
  3. There are three roles in Analytics Hub - A Data Publisher, Exchange Administrator, and a Data Subscriber.
  4. Analytics Hub efficiently and securely exchanges data analytics assets across organizations to address challenges of data reliability and cost.
A
  1. Analytics Hub requires batch data pipelines that extract data from databases, store it in flat files, and transmit them to the consumer where they are ingested into another database.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does the Aggregation Values contain in any feature?

  1. The min, zeros, and Std.dev values for each features
  2. The min, median, and max values for each features
  3. The Count, median, and max values for each features
  4. The min, median, and Std.dev values for each features
A
  1. The min, median, and max values for each features
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Model complexity often refers to the number of features or terms included in a given predictive model. What happens when the complexity of the model increases?

  1. All of the options are correct.
  2. Model performance on a test set is going to be poor.
  3. Model will not figure out general relationships in the data.
  4. Model is more likely to overfit.
A
  1. All of the options are correct.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

The learning rate is a configurable hyperparameter used in the training of neural networks that has a small positive value, often in the range between _______

  1. < 0.0 and > 1.00.
  2. > 0.0 and < 1.00.
  3. 0.0 and 1.0.
  4. 1.0 and 3.0.
A
  1. 0.0 and 1.0.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which of the following is true?

  1. Smaller batch sizes require larger learning rates.
  2. Larger batch sizes require larger learning rates.
  3. Smaller batch sizes require smaller learning rates.
  4. Larger batch sizes require smaller learning rates.
A
  1. Larger batch sizes require smaller learning rates.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which of the following can make a huge difference in model quality?

  1. Decreasing the number of epochs.
  2. Increasing the learning rate.
  3. Increasing the training time.
  4. Setting hyperparameters to their optimal values for a given dataset.
A
  1. Setting hyperparameters to their optimal values for a given dataset.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which of the following is a black-box optimization service?

  1. Vertex Vizier
  2. AutoML
  3. Manual Search
  4. Early stopping
A
  1. Vertex Vizier
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Bayesian optimization takes into account past evaluations when choosing the hyperparameter set to evaluate next. By choosing its parameter combinations in an informed way, it enables itself to focus on those areas of the parameter space that it believes will bring the most promising validation scores. Therefore it _____________________.

  1. requires less iterations to get to the optimal set of hyperparameter values.
  2. limits the number of times a model needs to be trained for validation.
  3. enables itself to focus on those areas of the parameter space that it believes will bring the most promising validation scores.
  4. All of the options are correct.
A
  1. All of the options are correct.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Black box optimization algorithms find the best operating parameters for any system whose ______________?

  1. execution time is less.
  2. performance can be measured as a function of adjustable parameters.
  3. iterations to get to the optimal set of hyperparameter values are less.
  4. number of iterations is limited to train a model for validation.
A
  1. performance can be measured as a function of adjustable parameters.
17
Q

Which of the following algorithms is useful, if you want to specify a quantity of trials that is greater than the number of points in the feasible space?

  1. Bayesian Optimization
  2. Random Search
  3. Grid Search
  4. Manual Search
A
  1. Grid Search
18
Q

What are the features of Vertex AI model monitoring?

  1. Skew in training vs. serving data
  2. Drift in data quality
  3. All of the options are correct.
  4. Feature Attribution and UI visualizations
A
  1. All of the options are correct.
19
Q

Which of the following statements is invalid for a data source file in batch prediction?

  1. If the Cloud Storage bucket is in a different project than where you use Vertex AI, you must provide the Storage Object Creator role to the Vertex AI service account in that project.
  2. The first line of the data source CSV file must contain the name of the columns.
  3. You must use a regional BigQuery dataset.
  4. BigQuery data source tables must be no larger than 100 GB.
A
  1. You must use a regional BigQuery dataset.
20
Q

Which statements are correct for serving predictions using Pre-built containers?

  1. Pre-built containers provide HTTP prediction servers that you can use to serve prediction using minimal configurations.
  2. Vertex AI provides Docker container images that you run as pre-built containers for serving predictions.
  3. All of the options are correct.
  4. Pre-built containers are organized by Machine learning framework and framework version.
A
  1. All of the options are correct.
21
Q

For which, the baseline is the statistical distribution of the feature’s values seen in production in the recent past.

  1. Numerical features
  2. Categorical features
  3. Skew detection
  4. Drift detection
A
  1. Drift detection
22
Q

What should be done if the source table is in a different project?

  1. You should provide the BigQuery Data Viewer role to the Vertex AI service account in that project.

2 .You should provide the BigQuery Data Editor role to the Vertex AI service account in that project.

  1. You should provide the BigQuery Data Viewer role to the Vertex AI service account in your project.
  2. You should provide the BigQuery Data Editor role to the Vertex AI service account in your project.
A

2 .You should provide the BigQuery Data Editor role to the Vertex AI service account in that project.

23
Q

Which statement is correct regarding the maximum size for a CSV file during batch prediction?

  1. Each data source file must not be larger than 10 GB. You can include multiple files, up to a maximum amount of 100 GB.
  2. The data source file must be no larger than 100 GB.
  3. Each data source file must include multiple files, up to a maximum amount of 50 GB.
  4. The data source file must be no larger than 50 GB. You can not include multiple files.
A
  1. Each data source file must not be larger than 10 GB. You can include multiple files, up to a maximum amount of 100 GB.
24
Q

Which package is used to define and interact with pipelines and components?

  1. kfp.dsl package
  2. kfp.components
  3. kfp.compiler
  4. kfp.containers
A
  1. kfp.dsl package
25
Q

What can you use to create a pipeline run on Vertex AI Pipelines?

  1. Vertex AI python client
  2. kfp.v2.compiler.Compiler
  3. Pipeline root path
  4. Service account
A
  1. Vertex AI python client
26
Q

What can you use to compile the pipeline?

  1. compiler.Compiler
  2. kfp.Compiler
  3. kfp.v2.compiler
  4. kfp.v2.compiler.Compiler
A
  1. kfp.v2.compiler.Compiler
27
Q

How can you define the pipeline’s workflow as a graph?

  1. By using different inputs for each component.
  2. By using the outputs of a component as an input of another component
  3. Use the previous pipeline’s output as an input for the current pipeline.
  4. By using predictive input for each component.
A
  1. By using the outputs of a component as an input of another component