Big Data with BigQuery Flashcards
What is BigQuery?
- It is a fully managed data warehouse
- It provides two services in one storage plus Analytics (built-in features like machine learning geospatial analysis and business intelligence)
How BigQuery integrates into AI lifecycle?
- It has built-in machine learning features (you can write ml models directly in bigquery using SQL)
- you can export data sets from bigquery directly into vertex AI or other servises for a seamless integration across the data to AI lifecycle
What kind of data BigQuiry can consume?
the input data can be either real-time or batch data
Describe the data ingestion process into BigQuery.
if it’s streaming data which can be either structured or unstructured high speed and large volume PubSub is needed to digest the data if it’s batch data it can be directly uploaded to Cloud Storage after that both pipelines lead to DataFlow.
What is an advantage of using DataFlow to stream data into BigQuery?
Inconsistency might result from saving and processing data separately.
To avoid that risk, consider using Dataflow to build a streaming data pipeline into BigQuery.
What types of datasourses can be used in BigQuery to ingest data?
- internal/native and external data sources
- Multi-cloud data, which is data stored in multiple cloud services, such as AWS or Azure
- public dataset (any of the datasets available in the public dataset marketplace)
What are basic patterns to load data into BigQuery?
- a batch load (source data is loaded into a BigQuery table in a single batch operation)
- streaming (smaller batches of data are streamed continuously so that the data is available for querying in near-real time)
- generated data (SQL statements are used to insert rows into an existing table or to write the results of a query to a table)
What analytics features that are available in BigQuery?
- Ad hoc analysis (using Standard SQL, the BigQuery SQL dialect)
- Geospatial analytics (using geography data types and Standard SQL geography functions.)
- Building machine learning models (using BigQuery ML)
- Building interactive BI dashboards (using BigQuery BI Engine)
What steps are necessary to run BigQuieryML?
1) Create a model with a SQL statement.
2) Write a SQL prediction query and invoke ml.Predict.
How the machine learning hyperparameters can be defined?
You can either manually control the hyperparameters or hand it to BigQuery starting with a default hyperparameter setting and then automatic tuning.
What models are available?
What options are available for machine learning operations. Options include: Importing TensorFlow models for batch prediction Exporting models from BigQuery ML for online prediction And hyperparameter tuning using Vertex AI Vizier
or machine learning operations. Options include: Importing TensorFlow models for batch prediction Exporting models from BigQuery ML for online prediction And hyperparameter tuning using Vertex AI Vizier