BigQuery Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

What is BigQuery?

A
  • Serverless, highly-scalable, low-cost enterprise data warehouse
  • No infrasturcture to manage, focus on analyzing data and being productive
  • Allows capture and analysis of data in real-time using powerful streaming ingestion
  • Free up to 1TB of data analyzed per month and 10GB data stored BigQuery is Google’s fully managed, petabyte scale, low cost analytics data warehouse.

BigQuery is NoOps—there is no infrastructure to manage and you don’t need a database administrator—so you can focus on analyzing data to find meaningful insights, use familiar SQL, and take advantage of our pay-as-you-go model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do the standard data-warehouse concepts map to BigQuery?

A

1) Data warehouse: BigQuery service replaces infrastructure, serves as collective home for all analytical data in an organization
2) Data mart: Datasets are collections of tables divided along business lines or anlystical domain, each is tied to a GCP projects
3) Data lake: Data may be in Cloud Storage or Google Drive, or transactional data in Cloud Bigtable. Big query can define a schema and issue queries directly on external data as federated data sources.
4) Tables and views: function the same as traditional data warehouse
5) Grants: Google Cloud Identity and Access Management (IAM) used to grant permission to perform specific actions in BigQuery

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are ‘datasets’ in BigQuery?

A

1) Organizes tables into units called datasets
2) One project can have multiple datasets, which in turn can have multiple tables
3) To reference a table in code or sql, goes through the construct ‘project.dataset.table’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does provisioning and system sizing work in BigQuery?

A
  • BigQuery allocates storage and query resources dynamically based on usage patterns
  • Storage resources are allocated as you consume them and deallocated as you remove data or drop tables
  • Query resources are allocated according to the query type and complexity. Each query uses a number of ‘slots’, which are units of computation that use a certain amount of CPU and RAM. Default is 2,000 slots for query operations.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How is storage managed in BigQuery?

A
  • BigQuery stores data in a proprietary columnar format called ‘Capacitator’, designed to evolve and optimize query execution.
  • Data is stored on Google’s distributed file-system, called Colossus, which ensures durability via storing redundant chunks on multiple physical disks.
  • Data is also replicated to multiple data-centers. - Federated data sources allows you to use BigQuery on data outside BigQuery sorage, such as data stored in Cloud Storage, Drive or Bigtable These are not optimized for BigQuery, so may perform slower.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How does BigQuery maintenance impact me?

A
  • fully managed, updates and maintenance are taken care of for you, updates require no downtime or impacts performance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What backup and recovery features does BigQuery provide?

A
  • Maintains complete 7-day history of changes against tables, allows for querying a point-in-time snapshot of your data - Easily revert changes without performing recovery from backups - Note: Explicit table deletions flush history after 2-days
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What should you consider when ‘Managing Worfklows’ with BigQuery?

A
  • Organizing datasets
  • Granting permissions via Cloud IAM and roles
  • Onboarding, greatly accelerated productivity, centralized view of assets, file collaboration
  • Managing workloads and concurrency:
  • BigQuery limits the maximum incoming requests on a per-project basis based on policy and quota limitations
  • Offers two types of query priorities:
  1. interactive - query is executed as soon as possible, counts toward query quotas
  2. batch - are queued and executed as soon as idle resources are available, usually within a few minutes
  • Monitoring and auditing: Monitor with Stackdriver based on BigQuery metrics. BigQuery auto-creates audit logs of user actions, exportable to another BigQuery dataset for data analysis.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What should you consider when ‘Managing Data’ for BigQuery?

A
  • schema design
  • denormalization
  • partitioning
  • loading data into BigQuery
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How does BigQuery support for queries?

A
  • supports standard SQL queries and is compatible with ANSI SQL 2011
  • SQL support extended to support nested and repeatable fields
  • Support for interactive and automated queries, optimization, external resources, user-defined functions and query sharing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the difference between BigQuery and Bigtable?

A

The difference is basically this:

BigQuery is a query Engine for datasets that don’t change much, or change by appending. It’s a great choice when your queries require a “table scan” or the need to look across the entire database. Think sums, averages, counts, groupings. BigQuery is what you use when you have collected a large amount of data, and need to ask questions about it.

BigTable is a database. It is designed to be the foundation for a large, scaleable application. Use BigTable when you are making any kind of app that needs to read and write data, and scale is a potential issue.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a BigQuery statement made up of?

A

A series of tokens, which include:

  1. Identifiers
    • names for db objects
    • must start with a letter or underscore
    • subsequent chars can be letters. numbers or underscores
  2. Quoted Identifiers
    • enclosed with backticks, may contain any character
    • uses same escaping as Literals
  3. Literals
    • A constant value of a built-in data type
    • Not all data types can be expressed as literals
    • String and byte literals must be quoted with single, double or triple double-quotes OR single-quotes
  4. Keywords
    • backticks must be used for any reserved keywords
    • Keywords or case INsensitive
  5. Operators
    • special characters or keywords which do manipulations and return results
    • All operators return NULL when one of the operands is NULL
    • All operators will throw an error if the computation result overflows
    • Examples: . [] - ~ * / + - << >> & ^ = < > <== etc…
  6. Special Characters
    • Single-line comments use # or –
    • Multi-line comments use /* and */
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the standard sql BigQuery data types?

A
  1. Integer: no decimal points; INT64, 8 bytes
  2. Floating point: FLOAT64, 8 bytes of double precision
  3. Boolean: keywords TRUE or FALSE
  4. String: variable length character data, UTF-8 encoded
  5. Bytes: variable length binary data;
  6. Date: logical calendar date, no timezone
  7. Datetime: a point in time, civil time (what you would see on watch)
  8. Time: just time, independent of date
  9. Timestamp: absolute point in time, microsecond precision, no timezone, derfault is UTC
  10. Array: Array Ordered list of zero or more elements of any non-ARRAY type.; types are declared using angle brackets <>, complex types, may not contain NULL ‘elements’, arrays of arrays are not allowed; a NULL array response is translated to empty but are two distinct values
  11. Struct: STRUCT Container of ordered fields each with a type (required) and field name (optional).; types declared using angle brackets <> and may be complex types
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the ways to construct an Array in BigQuery?

A
  1. Array literals: SELECT [1,2,3] from numbers;
  2. Array keyword: SELECT Array[1,2,3] as floats;
    • not required to declare for most data types like INT64 or STRING
  3. Empty Array of a specific type: Array[] pr just [] and BigQuery will attempt to infer it from the data. If it can not, it will default to INT64.
  4. Generated values: GENERATE_ARRAY function: SELECT GENERATE_ARRAY(11,32,2) as odds; the last param is a step value, which can be negative to go in descending order. There is also GENERATE_DATE_ARRAY that takes an date range and interval params.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are Big Query Jobs

A

Jobs are actions that BigQuery runs on your behalf to load data, export data, query data, or copy data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are Big Query external Data Sources, what are the two types?

A

An external data source is a data source that you can query directly from BigQuery, even though the data is not stored in BigQuery storage. For example, you might have data in a different Google Cloud database, in files in Cloud Storage, or in a different cloud product altogether that you would like to analyze in BigQuery, but that you aren’t prepared to migrate.

Use cases for external data sources include the following:

  • For extract-load-transform (ELT) workloads, loading and cleaning your data in one pass and writing the cleaned result into BigQuery storage, by using a CREATE TABLE … AS SELECT query.
  • Joining BigQuery tables with frequently changing data from an external data source. By querying the external data source directly, you don’t need to reload the data into BigQuery storage every time it changes.

BigQuery has two different mechanisms for querying external data: external tables and federated queries.

17
Q

What is Big Query Transfer Service?

A

The BigQuery Data Transfer Service automates data movement from SaaS applications to Google BigQuery on a scheduled, managed basis. Your analytics team can lay the foundation for a data warehouse without writing a single line of code. BigQuery Data Transfer Service initially supports Google application sources like Adwords, DoubleClick Campaign Manager, DoubleClick for Publishers and YouTube.