Google ML Engineer Flashcards by Shabbir Akolawala

What kinds of problem benefit from ML

Identification (Classification)
Prediction
Grouping (Clustering)

How well did you know this?

Not at all

Perfectly

What problem does ML Solve

Identification (Classification)
Prediction
Grouping (Clustering)

How well did you know this?

Not at all

Perfectly

What are technical success metrics

Accuracy (Positive match/total)
Precision (True Positive match / (False positive + True positive)
Recall (True Positive match / (True positive + False negative)

How well did you know this?

Not at all

Perfectly

What are different types of model

Decision Tree
Deep Learning
Regression

How well did you know this?

Not at all

Perfectly

What are the steps in ML ops pipeline?

Data extraction
Data validation
Data preparation
Model training
Model evaluation
Model validation

How well did you know this?

Not at all

Perfectly

Which of the following comes first in a machine-learning pipeline?
- Model evalution
- Data extraction
- Data preparation
- Model training

Data extraction

How well did you know this?

Not at all

Perfectly

Which of the following is not a kind of data preparation task?
- Addressing missing data
- Removing unwanted data
- Integrating data
- Evaluating model performance

Evaluating model performance

How well did you know this?

Not at all

Perfectly

Which one isn’t the target feature of the label dataset
- Attributes of instances
- Structured tables
- Images
- Random data

Random data

How well did you know this?

Not at all

Perfectly

What are target values of label dataset

Class or category
Value to predict

How well did you know this?

Not at all

Perfectly

What are the classification algorithms

Logistic Regression
Decision Trees (End of decision trees are classifications), e.g. Random trees
Naive Bayes (It uses statistic and probability, it use to collate and coappear for classification)
Neural Networks (Deep learning network)
Nearest Neighbor (How near or far points are in space for classification)
Ensemble methods (These are different methods)

How well did you know this?

Not at all

Perfectly

Is logistic regression a classification algorithm or regression algorithm

Classification algorithm

How well did you know this?

Not at all

Perfectly

Give a example of decision tree algorithm

Random forest

How well did you know this?

Not at all

Perfectly

Which algorithm is based on How near or far points are in space and what type of algorithm it is?

Nearest Neighbor
Classification algorithm

How well did you know this?

Not at all

Perfectly

Which are the three regression algorithm

Linear Regression (Learn formula from series of values which represent the strait line and predict based on formula)
Decision Tree Regression (Using the structure of decision trees)
Polynomial Regression (Like to learn a function, but instead of the straight line, it could be a curve)

How well did you know this?

Not at all

Perfectly

You have data on the price of cars sold over the past two years. You have data on the sale price, age of the car, mileage, interior features, gas mileage, and several other features. You want to use this data to predict the sale prices of other cars. What kind of ML problem is this?
- Classification
- Regression
- Reinforcement learning
- Unsupervised learning

This is a regression problem because we are trying to predict a continuous value.

How well did you know this?

Not at all

Perfectly

Which of the following is a process risk to successfully deploying a machine learning model? (One choice)
- Insufficiently agreed upon objectives
- Insufficient data
- Biased data
- High F1 score

Study These Flashcards

Insufficiently agreed upon objectives

Which are the three Unsupervised Learning algorithm

Study These Flashcards

Do not use labeled data sets
Clustering
- K means Clustering
Association rules
- A priori algorithm
Dimensionality reduction
- Principle component analysis
- Autoencoders (compression of data)

Which of the following isn’t the use case for Unsupervised learning? (One choice)
- Grouping and segmentation
- Data compression
- Work in game like scenario
- Anomaly detection

Study These Flashcards

Work in game like scenario. This is part of Reinforcement Learning instead of Unsupervised learning

Where are the characteristic of Reinforcement Learning? (multiple choice)
- Agent makes a series of choices in an environment
- Environment provides positive or negative feedback
- Trial and error, learn from feedback
- Use in Dimensionality reduction

Study These Flashcards

Agent makes a series of choices in an environment
Environment provides positive or negative feedback
Trial and error, learn from feedback

Use in Dimensionality reduction is unsupervise learning problem

Which one is security risk in ML Model Development? (One choice)
- Insufficient data
- Data quality issues (Data exploration, mis categorizing data, missing data)
- Biased data
- Data poisioning

Study These Flashcards

Data poisoning (security risk) where the actor deliberately feeds data which is biased and has a missing data set to tweak the output of ML algorithm

What Google Cloud service would you use to store 3 TB of raw data files in Parquet format that will be processed and then used for machine learning training?
- Cloud Storage
- Cloud SQL
- Bigtable
- Cloud Dataproc

Study These Flashcards

Cloud Storage (storing objects such as raw data files.)

What Vertex AI feature supports managed and user managed Jupyter Notebooks?
- Vertex AI Training
- Vertex AI FeatureStore
- Vertex AI Workbench
- Vertex AI Labeling

Study These Flashcards

Vertex AI Workbench (support for Jupyter Notebooks.)

You are building a deep learning network and need to preform large volumes of low precision calculations. What accelerator would you choose?
- TPU
- GPU
- Edge devices
- Kubernetes Pod

Study These Flashcards

TPU

Which of the following is not an example of sensitive data?
- Faces in an image
- Government issued ID number
- Notes in electronic patient records
- Tracking number for an e-commerce shipment

Study These Flashcards

Tracking number for an e-commerce shipment

What is a statistical object that describes central tendency and spread of values called? - Mean - Distribution - Variance - Mode

Distribution

What are valid sources of data for use with Vertex AI Datasets? - CSV files and BigQuery tables/views only - CSV files and Parquet files only - BigQuery tables/views only - CSV files, BigQuery, and Bigtable

CSV files and BigQuery tables/views only

When is deleting rows with missing data is not a reasonable option for handling missing data? - When there are many rows with no missing data - When a significant portion of datasets has missing values for some feature - When the data is stored in Parquet format - When the data is not in a relational database

When a significant portion of datasets has missing values for some feature

What is the role of feature attributions in explaining predictions? - Attribution prevents overfitting - Attributions are a way of handling missing data - Attributions are a measure of how much a feature contributes to a prediction - Attributions are a type of data augmentation

Attributions are a type of data augmentation

Which of the following are types of data augmentation used with images? - Crop and flip - Crop and synonym substitution - Feature attribution - Imputing data and Crop

Crop and flip

What type of hyperparameter tuning algorithms use sequential processing and use the results from prior evaluations to inform evaluations of new hyperparameter values? - Bayesin Search - Random Search - Grid Search - Data Augmentation

Bayesian search updates priors or information known as the start of the evaluation

When using distributed training in Vertex AI, what kind of node is responsible for communicating gradients between nodes? - Worker Nodes - Primary Replicas - Reduction Server - Backup Nodes

The purpose of the reduction server is to increase throughput by communicating gradients among worker nodes.

You want to run TensorFlow models most efficiently in Google Cloud. What serving option would you choose? - KubeFlow - TensorFlow open source - Optimized TensorFlow Runtime - XGBoost

Optimized TensorFlow Runtime contains optimizations used internally at Google Cloud.

You want to use Vertex AI monitoring to detect when newly arriving data is significantly different than recent production data. What metric would you use? - Skew - Drift - Precision - FI Score

Drift measures difference in distribution with recent production data.

Google ML Engineer Flashcards

(33 cards)