Machine Learning in Enterprise Flashcards
Which two activities are involved in ML development?
- Version control and training operationalization
- Experimentation and version control
check - Experimentation and training operationalization
- Training formalization and training operationalization
- Experimentation and training operationalization
What is the correct process that data scientists use to develop the models on an experimentation platform?
check
- Problem definition > Data selection > Data exploration > Feature engineering > Model prototyping > Model validation
- Problem definition > Data selection > Data exploration > Model prototyping > Feature engineering > Model validation
- Problem definition > Data selection > Data exploration > Model prototyping > Model validation > Feature engineering
- Problem definition > Data exploration > Data selection > Feature engineering > Model prototyping > Model validation
- Problem definition > Data selection > Data exploration > Feature engineering > Model prototyping > Model validation
Which process covers algorithm selection, model training, hyperparameter tuning, and model evaluation in the Experimentation and Prototyping activity?
- Model prototyping
- Model validation
- Feature engineering
- Data exploration
- Model prototyping
If the model needs to be repeatedly retrained in the future, an automated training pipeline is also developed. Which task do we use for this?
- Training implementation
- Training operationalization
- Experimentation & prototyping
- Training formalization
- Training operationalization
Which of the following is correct for Online serving?
- Online serving is for high-latency data retrieval of small batches of data for real-time processing.
- Online serving is for high throughput and serving large volumes of data for offline processing.
- Online serving is for low throughput and serving large volumes of data for offline processing.
- Online serving is for low-latency data retrieval of small batches of data for real-time processing.
- Online serving is for low-latency data retrieval of small batches of data for real-time processing.
Which Data processing option can be used for transforming large unstructured data in Google Cloud?
- Dataflow
- Beam proc
- Apache prep
- Hadoop proc
- Dataflow
Which of the following is not a part of Google’s enterprise data management and governance tool?
- Data Catalog
- Dataplex
- Analytics Catalog
- Feature Store
- Analytics Catalog
Which of the following statements is not a feature of Analytics Hub?
- You can create and access a curated library of internal and external assets, including unique datasets like Google Trends, backed by the power of BigQuery.
- Analytics Hub requires batch data pipelines that extract data from databases, store it in flat files, and transmit them to the consumer where they are ingested into another database.
- There are three roles in Analytics Hub - A Data Publisher, Exchange Administrator, and a Data Subscriber.
- Analytics Hub efficiently and securely exchanges data analytics assets across organizations to address challenges of data reliability and cost.
- Analytics Hub requires batch data pipelines that extract data from databases, store it in flat files, and transmit them to the consumer where they are ingested into another database.
What does the Aggregation Values contain in any feature?
- The min, zeros, and Std.dev values for each features
- The min, median, and max values for each features
- The Count, median, and max values for each features
- The min, median, and Std.dev values for each features
- The min, median, and max values for each features
Model complexity often refers to the number of features or terms included in a given predictive model. What happens when the complexity of the model increases?
- All of the options are correct.
- Model performance on a test set is going to be poor.
- Model will not figure out general relationships in the data.
- Model is more likely to overfit.
- All of the options are correct.
The learning rate is a configurable hyperparameter used in the training of neural networks that has a small positive value, often in the range between _______
- < 0.0 and > 1.00.
- > 0.0 and < 1.00.
- 0.0 and 1.0.
- 1.0 and 3.0.
- 0.0 and 1.0.
Which of the following is true?
- Smaller batch sizes require larger learning rates.
- Larger batch sizes require larger learning rates.
- Smaller batch sizes require smaller learning rates.
- Larger batch sizes require smaller learning rates.
- Larger batch sizes require smaller learning rates.
Which of the following can make a huge difference in model quality?
- Decreasing the number of epochs.
- Increasing the learning rate.
- Increasing the training time.
- Setting hyperparameters to their optimal values for a given dataset.
- Setting hyperparameters to their optimal values for a given dataset.
Which of the following is a black-box optimization service?
- Vertex Vizier
- AutoML
- Manual Search
- Early stopping
- Vertex Vizier
Bayesian optimization takes into account past evaluations when choosing the hyperparameter set to evaluate next. By choosing its parameter combinations in an informed way, it enables itself to focus on those areas of the parameter space that it believes will bring the most promising validation scores. Therefore it _____________________.
- requires less iterations to get to the optimal set of hyperparameter values.
- limits the number of times a model needs to be trained for validation.
- enables itself to focus on those areas of the parameter space that it believes will bring the most promising validation scores.
- All of the options are correct.
- All of the options are correct.