Module 2 Flashcards
What are the four major challenges, data engineers and data scientists are facing? (Known as 4Vs)
variety, volume, velocity, and veracity.
True or False: “We can create and execute machine learning models on your structured datasets in BigQuery”
True
What are the two services that BigQuery provides in one?
A fully managed storage facility to load and store datasets, and also a fast SQL-based analytical engine.
….. is a large store, containing terabytes and petabytes of data gathered from a wide range of sources within an organization, that’s used to guide management decisions.
BigQuery
Can we perform hyperparameter tuning in BigQuery?
Yes, With BigQuery ML, you can either manually control the hyperparameters or hand it to BigQuery starting with a default hyperparameter setting and then automatic tuning.
What is One-hot encoding?
One-hot encoding is a method of converting categorical data to numeric data to prepare it for model training.
True or False: “BigQuery ML doesn’t automatically perform one-hot encoding of categorical values”
False
What is the phase 1 in a machine learning project?
In phase 1, you extract, transform, and load data into BigQuery, if it isn’t there already.
What does ML.PREDICT command do in BigQuery?
you can use the ML.PREDICT command on a trained model, and pass through the dataset you want to make the prediction on.
While the model is running, and even after it’s complete, you can view training progress with ….
ML.TRAINING_INFO
What are the four options Google Cloud offers for building machine learning models?
BigQueryML, AutoML, Pre-built APIs, Custom training
What is the difference between loss and cost function?
What are the three main stage of ML workflow?
Data Preparing, Model Development, Model Serving
…. is a set of tools and frameworks to help understand and interpret predictions made by machine learning models.
Explainable AI
A farm uses the machine learning technology of Google to detect defective apples in their crop, like those with irregular sizes or scratches. The goal is to identify only the apples that are actually bad so that no good apples are wasted. Which metric should the model focus on?
Precision