BigQuery Flashcards

Question 1

Q

What is BigQuery?

Answer

A

Serverless, highly-scalable, low-cost enterprise data warehouse
No infrasturcture to manage, focus on analyzing data and being productive
Allows capture and analysis of data in real-time using powerful streaming ingestion
Free up to 1TB of data analyzed per month and 10GB data stored BigQuery is Google’s fully managed, petabyte scale, low cost analytics data warehouse.

BigQuery is NoOps—there is no infrastructure to manage and you don’t need a database administrator—so you can focus on analyzing data to find meaningful insights, use familiar SQL, and take advantage of our pay-as-you-go model.

Question 2

Q

How do the standard data-warehouse concepts map to BigQuery?

Answer

A

1) Data warehouse: BigQuery service replaces infrastructure, serves as collective home for all analytical data in an organization
2) Data mart: Datasets are collections of tables divided along business lines or anlystical domain, each is tied to a GCP projects
3) Data lake: Data may be in Cloud Storage or Google Drive, or transactional data in Cloud Bigtable. Big query can define a schema and issue queries directly on external data as federated data sources.
4) Tables and views: function the same as traditional data warehouse
5) Grants: Google Cloud Identity and Access Management (IAM) used to grant permission to perform specific actions in BigQuery

Question 3

Q

What are ‘datasets’ in BigQuery?

Answer

A

1) Organizes tables into units called datasets
2) One project can have multiple datasets, which in turn can have multiple tables
3) To reference a table in code or sql, goes through the construct ‘project.dataset.table’

Question 4

Q

How does provisioning and system sizing work in BigQuery?

Answer

A

BigQuery allocates storage and query resources dynamically based on usage patterns
Storage resources are allocated as you consume them and deallocated as you remove data or drop tables
Query resources are allocated according to the query type and complexity. Each query uses a number of ‘slots’, which are units of computation that use a certain amount of CPU and RAM. Default is 2,000 slots for query operations.

Question 5

Q

How is storage managed in BigQuery?

Answer

A

BigQuery stores data in a proprietary columnar format called ‘Capacitator’, designed to evolve and optimize query execution.
Data is stored on Google’s distributed file-system, called Colossus, which ensures durability via storing redundant chunks on multiple physical disks.
Data is also replicated to multiple data-centers. - Federated data sources allows you to use BigQuery on data outside BigQuery sorage, such as data stored in Cloud Storage, Drive or Bigtable These are not optimized for BigQuery, so may perform slower.

Question 6

Q

How does BigQuery maintenance impact me?

Answer

A

fully managed, updates and maintenance are taken care of for you, updates require no downtime or impacts performance

Question 7

Q

What backup and recovery features does BigQuery provide?

Answer

A

Maintains complete 7-day history of changes against tables, allows for querying a point-in-time snapshot of your data - Easily revert changes without performing recovery from backups - Note: Explicit table deletions flush history after 2-days

Question 8

Q

What should you consider when ‘Managing Worfklows’ with BigQuery?

Answer

A

Organizing datasets
Granting permissions via Cloud IAM and roles
Onboarding, greatly accelerated productivity, centralized view of assets, file collaboration
Managing workloads and concurrency:
BigQuery limits the maximum incoming requests on a per-project basis based on policy and quota limitations
Offers two types of query priorities:

interactive - query is executed as soon as possible, counts toward query quotas
batch - are queued and executed as soon as idle resources are available, usually within a few minutes

Monitoring and auditing: Monitor with Stackdriver based on BigQuery metrics. BigQuery auto-creates audit logs of user actions, exportable to another BigQuery dataset for data analysis.

Question 9

Q

What should you consider when ‘Managing Data’ for BigQuery?

Answer

A

schema design
denormalization
partitioning
loading data into BigQuery

Question 10

Q

How does BigQuery support for queries?

Answer

A

supports standard SQL queries and is compatible with ANSI SQL 2011
SQL support extended to support nested and repeatable fields
Support for interactive and automated queries, optimization, external resources, user-defined functions and query sharing

Question 11

Q

What is the difference between BigQuery and Bigtable?

Answer

A

The difference is basically this:

BigQuery is a query Engine for datasets that don’t change much, or change by appending. It’s a great choice when your queries require a “table scan” or the need to look across the entire database. Think sums, averages, counts, groupings. BigQuery is what you use when you have collected a large amount of data, and need to ask questions about it.

BigTable is a database. It is designed to be the foundation for a large, scaleable application. Use BigTable when you are making any kind of app that needs to read and write data, and scale is a potential issue.

Question 12

Q

What is a BigQuery statement made up of?

Answer

A

A series of tokens, which include:

Identifiers
- names for db objects
- must start with a letter or underscore
- subsequent chars can be letters. numbers or underscores
Quoted Identifiers
- enclosed with backticks, may contain any character
- uses same escaping as Literals
Literals
- A constant value of a built-in data type
- Not all data types can be expressed as literals
- String and byte literals must be quoted with single, double or triple double-quotes OR single-quotes
Keywords
- backticks must be used for any reserved keywords
- Keywords or case INsensitive
Operators
- special characters or keywords which do manipulations and return results
- All operators return NULL when one of the operands is NULL
- All operators will throw an error if the computation result overflows
- Examples: . [] - ~ * / + - << >> & ^ = < > <== etc…
Special Characters
- Single-line comments use # or –
- Multi-line comments use /* and */

Question 13

Q

What are the standard sql BigQuery data types?

Answer

A

Integer: no decimal points; INT64, 8 bytes
Floating point: FLOAT64, 8 bytes of double precision
Boolean: keywords TRUE or FALSE
String: variable length character data, UTF-8 encoded
Bytes: variable length binary data;
Date: logical calendar date, no timezone
Datetime: a point in time, civil time (what you would see on watch)
Time: just time, independent of date
Timestamp: absolute point in time, microsecond precision, no timezone, derfault is UTC
Array: Array Ordered list of zero or more elements of any non-ARRAY type.; types are declared using angle brackets <>, complex types, may not contain NULL ‘elements’, arrays of arrays are not allowed; a NULL array response is translated to empty but are two distinct values
Struct: STRUCT Container of ordered fields each with a type (required) and field name (optional).; types declared using angle brackets <> and may be complex types

Question 14

Q

What are the ways to construct an Array in BigQuery?

Answer

A

Array literals: SELECT [1,2,3] from numbers;
Array keyword: SELECT Array[1,2,3] as floats;
- not required to declare for most data types like INT64 or STRING
Empty Array of a specific type: Array[] pr just [] and BigQuery will attempt to infer it from the data. If it can not, it will default to INT64.
Generated values: GENERATE_ARRAY function: SELECT GENERATE_ARRAY(11,32,2) as odds; the last param is a step value, which can be negative to go in descending order. There is also GENERATE_DATE_ARRAY that takes an date range and interval params.

Question 15

Q

What are Big Query Jobs

Answer

A

Jobs are actions that BigQuery runs on your behalf to load data, export data, query data, or copy data.

Question 16

Q

What are Big Query external Data Sources, what are the two types?

Answer

Study These Flashcards

A

An external data source is a data source that you can query directly from BigQuery, even though the data is not stored in BigQuery storage. For example, you might have data in a different Google Cloud database, in files in Cloud Storage, or in a different cloud product altogether that you would like to analyze in BigQuery, but that you aren’t prepared to migrate.

Use cases for external data sources include the following:

For extract-load-transform (ELT) workloads, loading and cleaning your data in one pass and writing the cleaned result into BigQuery storage, by using a CREATE TABLE … AS SELECT query.
Joining BigQuery tables with frequently changing data from an external data source. By querying the external data source directly, you don’t need to reload the data into BigQuery storage every time it changes.

BigQuery has two different mechanisms for querying external data: external tables and federated queries.

Question 17

Q

What is Big Query Transfer Service?

Answer

Study These Flashcards

A

The BigQuery Data Transfer Service automates data movement from SaaS applications to Google BigQuery on a scheduled, managed basis. Your analytics team can lay the foundation for a data warehouse without writing a single line of code. BigQuery Data Transfer Service initially supports Google application sources like Adwords, DoubleClick Campaign Manager, DoubleClick for Publishers and YouTube.

BigQuery Flashcards

(17 cards)