Introduction to Data Analytics on Google Cloud Flashcards by Aj Dur

Data Sources are connectors that let you do what with your data?

1) Query the data.
2) Clean the data.
3) Ingest and process the data.
4) Store the data.

The correct answer is: Query the Data

How well did you know this?

Not at all

Perfectly

Which Product is a serverless data warehouse for storage and analytics?

BigQuery
Cloud Storage
Cloud Spanner
Looker

The correct answer is: BigQuery

BigQuery is a serverless data warehouse provided by Google Cloud that is designed for large-scale data storage and analytics. It allows you to store massive datasets and run SQL queries without having to manage the infrastructure, making it ideal for analytics at scale.

How well did you know this?

Not at all

Perfectly

Which Google Cloud product is a relational database used to establish relationships between information in multiple datatables?

Cloud Spanner
BigQuery
BigTable
Dataproc

The correct answer is: Cloud Spanner

Cloud Spanner is a relational database that enables you to establish relationships between different pieces of information stored across multiple tables, making it ideal for applications that require strong consistency and high scalability. It is designed for both operational and analytical workloads, and it provides features like automatic scaling, global distribution, and high availability.

How well did you know this?

Not at all

Perfectly

What are the correct steps in the data analytics lifecycle?

Visualize results and share the data.
Activate, store, and analyze.
Visualize, process, and ingest.
Ingest, process, store, analyze, and activate.

Ingest, process, store, analyze, and activate.

How well did you know this?

Not at all

Perfectly

What type of data is used for machine learning?

Structured data only
Relational data
Structured and unstructured data
Raw data

The correct answer is: Structured and unstructured data.

Machine learning models can be trained on both structured and unstructured data, depending on the application. Here’s how they differ:

Structured data refers to data that is organized in a table or a defined schema, such as relational databases or spreadsheets. This data is often used in traditional machine learning tasks like regression, classification, and time-series analysis.

Unstructured data refers to data that doesn’t have a pre-defined format or organization, like text, images, audio, or video. Machine learning models can be trained on unstructured data using techniques like natural language processing (NLP) for text or convolutional neural networks (CNNs) for images.

While raw data (which is typically unprocessed or uncleaned) can be used for machine learning, it usually needs to be cleaned and processed before being fed into a model.

Relational data (often referring to structured data in a relational database) can be a subset of structured data but isn’t the only type of data used in machine learning.

How well did you know this?

Not at all

Perfectly

Cloud Storage

This is an object storage service for unstructured data like files, images, and backups, not a data warehouse.

How well did you know this?

Not at all

Perfectly

Cloud Spanner

This is a distributed relational database service, not a data warehouse.

How well did you know this?

Not at all

Perfectly

Looker

Looker is a business intelligence (BI) and data analytics platform, not a data warehouse. It connects to data warehouses like BigQuery for analytics.

How well did you know this?

Not at all

Perfectly

BigQuery

Bigquery is designed specifically for serverless, scalable analytics and storage, making it the right choice here.

How well did you know this?

Not at all

Perfectly

What is a Database?

an organized collection of data stored in tables and accessed electronically from a computer system

How well did you know this?

Not at all

Perfectly

What is a Relational Database?

A Relational Database is a type of database that stores data in tables with rows and columns, where each table represents a different entity and the relationships between them are defined by keys (e.g., primary keys, foreign keys). It uses Structured Query Language (SQL) for managing and querying the data

How well did you know this?

Not at all

Perfectly

What is a non Relational Database?

A Non-Relational Database (also known as NoSQL) is a type of database that stores data in formats other than tables (e.g., key-value pairs, documents, graphs, or wide-column stores). These databases are designed for flexible, scalable storage and can handle large amounts of unstructured or semi-structured data. They don’t require a fixed schema or predefined relationships between data.

How well did you know this?

Not at all

Perfectly

Which google Cloud offerings are for Relational Databases?

Cloud SQL, Cloud Spanner, AlloyDB for PostgreSQL

How well did you know this?

Not at all

Perfectly

Which google Cloud offerings are for Non Relational Databases?

BigTable

How well did you know this?

Not at all

Perfectly

What is a Data Warehouse?

A data warehouse contains structured and organized data, which can be used for advanced querying.

How well did you know this?

Not at all

Perfectly

What is Google Cloud’s data Warehouse offering called?

Bigqeury

What is a Datalake?

A data lake is just a pool of raw, unorganized, and unclassified data, which has no specified purpose

What does this mean? BigQuery is a fully managed data warehouse

It means that BigQuery takes care of the underlying infrastructure, so you can focus on using SQL queries to answer business questions without worrying about deployment, scalability, and security

Which File formats are self-describing so BigQuery can automatically infer the table schema from the source data

Avro, Parquet, ORC, Firestore export, or Datastore export

BigQuery is optimized for reading terabytes and petabytes of data. How can BigQuery read and handle large amounts of data?

1) BigQuery’s storage and analytics services operate independently.
2) BigQuery condenses data so that it can be read on the first pass.
check
3) BigQuery is a “columnar store,” so it only reads the relevant columns to execute a query.
4) BigQuery is optimized to read rows of data, which are easier to process than columns.

3) BigQuery is a “columnar store,” so it only reads the relevant columns to execute a query.

BigQuery is most efficient when working with data contained where?

Cloud Storage
Its own storage service
Bigtable
Google Sheets

Its own storage service

BigQuery is two services in one. What are the two services?

Reporting and sharing services
Warehouse and database services
Storage and query services
Relational and non-relational services

Storage and query services

What construct is used to reference a data table in a SQL query?

table.dataset
dataset.table.column
dataset.table.row
project.dataset.table

project.dataset.table

BigQuery is a fully managed data warehouse. What is a benefit of a data warehouse being fully managed?

Google Cloud handles the analyze step of the data analytics lifecycle.
Data management is handled by Google Cloud.
check
BigQuery takes care of the underlying infrastructure.
BigQuery processes data faster than it would if it were not fully managed.

BigQuery takes care of the underlying infrastructure.

What are Dimensions?

attributes or characteristics of your data, each column is a deimensions

What are measures?

calculations performed across multiple rows of data