Big Data with BigQuery Flashcards

1
Q

What is BigQuery?

A
  • It is a fully managed data warehouse
  • It provides two services in one storage plus Analytics (built-in features like machine learning geospatial analysis and business intelligence)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How BigQuery integrates into AI lifecycle?

A
  • It has built-in machine learning features (you can write ml models directly in bigquery using SQL)
  • you can export data sets from bigquery directly into vertex AI or other servises for a seamless integration across the data to AI lifecycle
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What kind of data BigQuiry can consume?

A

the input data can be either real-time or batch data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe the data ingestion process into BigQuery.

A

if it’s streaming data which can be either structured or unstructured high speed and large volume PubSub is needed to digest the data if it’s batch data it can be directly uploaded to Cloud Storage after that both pipelines lead to DataFlow.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is an advantage of using DataFlow to stream data into BigQuery?

A

Inconsistency might result from saving and processing data separately.
To avoid that risk, consider using Dataflow to build a streaming data pipeline into BigQuery.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What types of datasourses can be used in BigQuery to ingest data?

A
  • internal/native and external data sources
  • Multi-cloud data, which is data stored in multiple cloud services, such as AWS or Azure
  • public dataset (any of the datasets available in the public dataset marketplace)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are basic patterns to load data into BigQuery?

A
  • a batch load (source data is loaded into a BigQuery table in a single batch operation)
  • streaming (smaller batches of data are streamed continuously so that the data is available for querying in near-real time)
  • generated data (SQL statements are used to insert rows into an existing table or to write the results of a query to a table)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What analytics features that are available in BigQuery?

A
  • Ad hoc analysis (using Standard SQL, the BigQuery SQL dialect)
  • Geospatial analytics (using geography data types and Standard SQL geography functions.)
  • Building machine learning models (using BigQuery ML)
  • Building interactive BI dashboards (using BigQuery BI Engine)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What steps are necessary to run BigQuieryML?

A

1) Create a model with a SQL statement.
2) Write a SQL prediction query and invoke ml.Predict.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How the machine learning hyperparameters can be defined?

A

You can either manually control the hyperparameters or hand it to BigQuery starting with a default hyperparameter setting and then automatic tuning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What models are available?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What options are available for machine learning operations. Options include: Importing TensorFlow models for batch prediction Exporting models from BigQuery ML for online prediction And hyperparameter tuning using Vertex AI Vizier

A

or machine learning operations. Options include: Importing TensorFlow models for batch prediction Exporting models from BigQuery ML for online prediction And hyperparameter tuning using Vertex AI Vizier

How well did you know this?
1
Not at all
2
3
4
5
Perfectly