ETL , EL, ELT - Sheet1 (1) Flashcards

1
Q

When should you consider using Dataflow and BigQuery for data quality?

A

Dataflow and BigQuery are recommended for addressing data quality issues in general.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are some specific needs that Dataflow and BigQuery may not meet easily?

A

Low Latency and High Throughput; Reuse of Existing Spark Pipelines; Need for Visual Pipeline Building

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Dataproc?

A

Dataproc is a managed service for batch processing, querying, streaming, and machine learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the benefits of using Dataproc?

A

Cost-effective for Hadoop workloads; Autoscaling; Integration with other Google Cloud products

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Data Fusion?

A

Data Fusion is a fully managed, Cloud-native enterprise data integration service.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What can Data Fusion be used for?

A

Transformations; Cleanup; Ensuring data consistency; Populating a data warehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is an advantage of Data Fusion for non-programming role users?

A

Building visual pipelines without waiting for an IT team

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is an advantage of Data Fusion for IT staff?

A

Flexible API for creating scripts for automated execution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are important aspects to consider in ETL regardless of the tool used?

A

Data Lineage; Metadata and Data Catalog

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does data lineage refer to?

A

Data lineage refers to the data’s origin, processes it has undergone, and its current condition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why is data lineage important?

A

Understanding data suitability; Troubleshooting; Ensuring trust and regulatory compliance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the purpose of metadata in ETL?

A

Discovery and identification of data suitability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What service on Google Cloud provides data discoverability?

A

Data Catalog

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is required to make Data Catalog effective for data discoverability?

A

Adding labels to your resources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are labels in Google Cloud?

A

Key-value pairs that help organize resources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the benefits of using labels in Google Cloud?

A

Manage complex resources; Facilitate fine-grained look at Cloud Bill; First step towards a data catalog

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is Data Catalog?

A

A fully managed, highly scalable data discovery and metadata management service

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the features of Data Catalog?

A

No infrastructure setup or management; Enterprise-grade access control; Integration with Data Loss Prevention API

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the benefit of Data Catalog’s integration with Data Loss Prevention API?

A

Discover and classify sensitive data; Aid in data governance

20
Q

What can be done with Data Catalog?

A

Search metadata about datasets; Group datasets with tags; Flag columns containing sensitive data

21
Q

What is the advantage of using Data Catalog for dataset discovery?

A

Unified user experience; Quick access to datasets; Eliminate the need to hunt for specific table names

22
Q

What is the significance of data lineage in ETL?

A

Understanding data origin, processes, and current condition; Ensuring trust, regulatory compliance, and troubleshooting odd results

23
Q

What is metadata?

A

Information about the data that aids in discovery and identification of data suitability

24
Q

What is the purpose of metadata labels in Data Catalog?

A

To organize resources and enable better management

25
What are labels in Data Catalog?
Key-value pairs that help categorize and organize resources
26
What are the benefits of using labels in Data Catalog?
Simplify resource management; Enable fine-grained cost analysis; Step towards creating a data catalog
27
What is the role of Data Catalog in data discovery?
Fully managed metadata management service; Provides discoverability and searchability of datasets
28
What is Data Catalog's integration with the Data Loss Prevention API?
It allows discovery and classification of sensitive data, aiding in data governance
29
What are the advantages of using Data Catalog for metadata management?
Searchable metadata for datasets regardless of storage location; Grouping datasets with tags; Flagging columns with sensitive data
30
What is the benefit of Data Catalog's unified user experience?
Quick and easy discovery of datasets without the need to search for specific table names
31
What should be considered when evaluating Dataflow and BigQuery for data quality needs?
Low Latency and High Throughput; Reuse of Existing Spark Pipelines; Need for Visual Pipeline Building
32
What are the advantages of using Dataproc for data processing?
Managed service for batch processing, querying, streaming, and machine learning; Cost-effective for Hadoop workloads; Autoscaling; Integration with other Google Cloud products
33
What is the purpose of Data Fusion in ETL processes?
Fully managed, Cloud-native enterprise data integration service; Transformation, cleanup, ensuring data consistency, populating a data warehouse
34
What are the benefits of Data Fusion for non-programming role users?
Visual pipeline building without relying on IT team
35
What are the benefits of Data Fusion for IT staff?
Flexible API for automated execution
36
What are the important aspects to keep in mind regardless of the ETL tool used?
Data Lineage; Metadata and Data Catalog
37
What is the significance of data lineage in ETL processes?
Understanding data origin, processes, and current condition; Trust, troubleshooting, regulatory compliance
38
What is the purpose of metadata in ETL?
Discovery and identification of data suitability
39
What does Data Catalog provide for data discoverability?
Searchable metadata and labeling
40
What are the benefits of using labels in Data Catalog?
Organize resources; Fine-grained cost analysis; Step towards creating a data catalog
41
What is Data Catalog?
Managed, scalable data discovery and metadata management service
42
What are the features of Data Catalog?
No infrastructure setup or management; Enterprise-grade access control; Integration with Data Loss Prevention API
43
What are the benefits of Data Catalog's integration with Data Loss Prevention API?
Discover and classify sensitive data; Aid in data governance
44
What can be done with Data Catalog?
Search metadata; Group datasets with tags; Flag columns with sensitive data
45
What is the advantage of using Data Catalog for dataset discovery?
Unified user experience; Quick access to datasets; Eliminate the need to hunt for specific table names