ML Training 2 Flashcards
What is Precision?
From all the test examples that were assigned a label, how many actually were supposed to be categorized with that label.
TP/(TP+FP)
What is Recall?
From all the test examples that should have had the label assigned, how many were actually assigned the label.
TP/(TP+FN)
What are other ways to evaluate an AutoML model?
Precision, Recall, Confusion Matrix (see diagonal line), use Precision-Recall curve to decide score threshold (possible to assign to labels individually).
Difference between Colab Enterprise vs Vertex AI Notebook
Colab Enterprise: A collaborative, managed notebook environment with the security and compliance capabilities of Google Cloud. Choose this if your project’s priorities are to collaborate with others and to avoid spending time managing infrastructure.
Vertex AI Workbench: A Jupyter notebook-based environment provided through virtual machine (VM) instances with features that support the entire data science workflow. Choose this if your project’s priorities are control and customizability.
What platforms or features does Vertex AI Workbench support?
Importing conda environments, access data from Cloud Storage or BigQuery, automated notebook runs and idle shutdown, custom containers, third party credentials, monitoring instance, full control over infrastructure (VM instance).
How do you overcome imbalanced datasets?
Downsample the majority class examples and upweight the downsampled examples to reduce prediction bias. Experiment with this rebalancing ratio, just like a hyperparameter. The batch size should be several times greater than the imbalance ratio (>=5).
What is prediction bias?
A value indicating how far apart the average of predictions is from the average of labels in the dataset.
What is selection bias?
Errors in conclusions drawn from sampled data due to a selection process that generates systematic differences between samples observed in the data and those not observed.
Includes Coverage bias, sampling bias, non-response/participation bias.
What is coverage bias?
The population represented in the dataset doesn’t match the population that the machine learning model is making predictions about.
What is sampling bias?
Data is not collected randomly from the target group.
What is non-response/participation bias?
Users from certain groups opt-out of surveys at different rates than users from other groups.
What is collaborative filtering model?
Collaborative filtering is a recommendation technique that filters and predicts items a user might like based on the reactions and preferences of similar users.
The fundamental premise is that people who agreed in their evaluation of certain items are likely to agree again in the future.
What are the three main approaches to building recommendation systems on Google Cloud?
The three approaches are Matrix Factorization in BigQuery Machine Learning (BQML), Recommendations AI, and Two-Tower built-in algorithm.
What is required to train a matrix factorization model on BigQuery?
A table with three input columns: user(s), item(s), and a feedback variable (implicit or explicit, such as ratings).
What are the main benefits of Matrix Factorization on BigQuery?
The benefits include minimal ML expertise required (uses SQL), simple data input requirements, and ability to discover new user interests through collaborative filtering.