How Google does Machine Learning Flashcards

1
Q

What types of Vertex AI Workbench exist and what are the differences?

A

Managed instances – JupyterLab instances provisioned by Google, you can choose between different frameworks like Tensorflow or Pytorch, direct integration with many services like Dataproc, CloudStorage, BigQuery, define compute resources (CPU, GPU)

User-managed instances – JupyterLab instance that is highly customizable, you can use is as a Deep Learning VM, gives you more control of your environment and you can set it up within your VPC

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

If you change on step in the Vertex AI Pipelines or do have an error do you have to rerun the entire pipeline?

A

Vertex AI Pipelines are supporting caching of pipelines steps, it will rerun from the step where error occurred or for step that was modified.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are best practices for storing the structured and unstructured data in ML?

A

Structured data should always be stored within BigQuery while unstructured data should be in Google Cloud Storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are recommended services to preprocess tabular and unstructured data?

A

Tabular data should be preprocessed within BQ if it can be done with SQL transformations. For unstructured data and transformations that are not available via SLQ you should use Dataflow. If client is familiar with Spark, you can use Dataproc for transformations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Can Vertex AI work without managed datasets?

A

Yes, they are not mandatory.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the recommended next step after data preprocessing?

A

First do feature engineering and the create managed dataset in Vertex AI.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What should you use for data preprocessing if you are using TensorFlow framework?

A

TFX (TensorFlow Extended)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Depending on the size of the dataset when to train a model within JupyterLab notebook and when to use Vertex AI training?

A

For smaller datasets use only JupyerLab notebook.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is one of the tools offered by GCP that supports you in developing ML systems in a responsible way?

A

What If Tool (free), which is part of Tensorboard (paid) - help you diagnose fairness issues in your data, in your labels, etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How to make classification model more inclusive?

A

When you are satisfied with model performance, you can split the data is sub-groups (ex. by race, gender, etc) and validate performance per specific sub-group. You can use confusion matrix to calculate precision, recall, false positive rate and false negative rate. Choosing which rate to optimize and to which extent is highly dependent on the problem you are trying to solve and there is no easy answer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is equality of opportunity in responsible AI and equal opportunity threshold?

A

This is an approach that should be considered when designing an ML system that should make sure that you have an equal chance of getting the same prediction of the ML system regardless of the group you are part of. Threshold in this case is a cut-off (ex. financial income) where you classify all people above a certain threshold differently. It can often mean that threshold should have different values per sub-group to keep the same results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are Facets?

A

This is a an open-source product by Google that is helping you discover new insights from data, ex. detecting data skew, non-uniform feature distribution, detect differences in data between training and test set, etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly