14. BigQuery ML Flashcards
Who is the target audience of BigQuery ML?
Data analysts and others who are familiar with SQL prefer to use BigQuery ML instead of other methods.
What are the three ways to access data in BigQuery?
Web console to write a SQL query
Use magic command %%bigquery in Jupyter Notebook
Use a Python API to run the same query in Jupyter Notebook using the Python API
Is BigQuery ML serverless?
Yes. BigQuery ML is a completely serverless method to train and predict.
What are the keywords for create models in BigQuery ML?
CREATE MODEL, CREATE MODEL IF NOT EXISTS, CREATE OR REPLACE MODEL
What are the two optional commands after the CREATE MODEL keyword?
model_type, input_label_cols
What model categories does BigQuery ML support?
Regression: LINEAR_REG, BOOSTED_TREE_REGRESSOR, DNN_REGRESSOR, AUTOML_REGRESSION
Classification: LOGISTIC_REG, BOOSTED_TREE_CLASSIFIER, DNN_CLASSIFIER, DNN_LINEAR_COMBINED_CLASSIFIER, AUTOML_CLASSIFIER
Deep and wide neural network (recommendation systems and personalization): DNN_LINEAR_COMBINED_REGRESSOR, DNN_LINEAR_COMBINED_CLASSIFIER
Clustering: KMEANS
Collaborative filtering: MATRIX_FACTORIZATION
Dimensionality reduction: PCA, AUTOENCODER
Time-series forecasting: ARIMA_PLUS
General: TensorFlow
Hints: Curious Cat Discovers Really Cool Treasures Near Trees
What is the keyword for model evaluation in Bigquery ML?
ML.EVALUATE
What are the two levels BigQuery ML explainability?
Model level and individual prediction level
What is the keyword for prediction in Bigquery ML?
ML.PREDICT
What is the statement for querying explanations?
ML.GLOBAL_EXPLAIN(MODEL ` model1`)
What is the statement for enabling global explanations during model training?
enable_global_explain=TRUE
What are the explainability methods for different model types?
Linear and logistic regression: Shapley values and standard errors, p‐values
Boosted Trees: Tree SHAP, Gini‐based feature importance
Deep Neural Network and Wide-and-Deep: Integrated gradients
Arima_PLUS: Time-series decomposition
Hints: Railroad Tales Nurturing Towns
Compare BigQuery ML and Vertex AI Tables
BigQuery is a serverless data warehouse
Users are SQL experts.
Use BigQuery_scheduled queries for automation
Use Looker for visualization
Vertex AI is for data scientists
Use Jupyter Notebooks and Pandas DataFrames.
Need fine-grained control over the workflow.
What are the six integration points for Vertex AI and BigQuery ML?
Access BigQuery public dataset from Vertex AI.
Import BigQuery data into Vertex AI.
Access BigQuery data from Vertex AI Workbench Notebooks (directly browse your BigQuery dataset)
Export batch prediction data to BigQuery for further analysis
Export BigQuery Models into Vertex AI (GCS to Vertex AI or Model Registry)
Hints: Mysterious Island Whispers: Pirates Never Mined Patiently
What is hashed feature?
It addresses three problems:
Incomplete vocabulary (values not fall into current categories)
High cardinality (e.g., zip code)
Cold start problem: New categories (e.g., new staff ID).
Transform this high cardinal variable into a low cardinal domain by hashing, e.g., FarmHash: ABS(MOD(FARM_FINGERPRINT(zipcode), numbuckets))