ETL , EL, ELT - Sheet1 (1) Flashcards

1
Q

When should you consider using Dataflow and BigQuery for data quality?

A

Dataflow and BigQuery are recommended for addressing data quality issues in general.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are some specific needs that Dataflow and BigQuery may not meet easily?

A

Low Latency and High Throughput; Reuse of Existing Spark Pipelines; Need for Visual Pipeline Building

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Dataproc?

A

Dataproc is a managed service for batch processing, querying, streaming, and machine learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the benefits of using Dataproc?

A

Cost-effective for Hadoop workloads; Autoscaling; Integration with other Google Cloud products

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Data Fusion?

A

Data Fusion is a fully managed, Cloud-native enterprise data integration service.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What can Data Fusion be used for?

A

Transformations; Cleanup; Ensuring data consistency; Populating a data warehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is an advantage of Data Fusion for non-programming role users?

A

Building visual pipelines without waiting for an IT team

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is an advantage of Data Fusion for IT staff?

A

Flexible API for creating scripts for automated execution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are important aspects to consider in ETL regardless of the tool used?

A

Data Lineage; Metadata and Data Catalog

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What does data lineage refer to?

A

Data lineage refers to the data’s origin, processes it has undergone, and its current condition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Why is data lineage important?

A

Understanding data suitability; Troubleshooting; Ensuring trust and regulatory compliance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the purpose of metadata in ETL?

A

Discovery and identification of data suitability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What service on Google Cloud provides data discoverability?

A

Data Catalog

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is required to make Data Catalog effective for data discoverability?

A

Adding labels to your resources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are labels in Google Cloud?

A

Key-value pairs that help organize resources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the benefits of using labels in Google Cloud?

A

Manage complex resources; Facilitate fine-grained look at Cloud Bill; First step towards a data catalog

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is Data Catalog?

A

A fully managed, highly scalable data discovery and metadata management service

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are the features of Data Catalog?

A

No infrastructure setup or management; Enterprise-grade access control; Integration with Data Loss Prevention API

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is the benefit of Data Catalog’s integration with Data Loss Prevention API?

A

Discover and classify sensitive data; Aid in data governance

20
Q

What can be done with Data Catalog?

A

Search metadata about datasets; Group datasets with tags; Flag columns containing sensitive data

21
Q

What is the advantage of using Data Catalog for dataset discovery?

A

Unified user experience; Quick access to datasets; Eliminate the need to hunt for specific table names

22
Q

What is the significance of data lineage in ETL?

A

Understanding data origin, processes, and current condition; Ensuring trust, regulatory compliance, and troubleshooting odd results

23
Q

What is metadata?

A

Information about the data that aids in discovery and identification of data suitability

24
Q

What is the purpose of metadata labels in Data Catalog?

A

To organize resources and enable better management

25
Q

What are labels in Data Catalog?

A

Key-value pairs that help categorize and organize resources

26
Q

What are the benefits of using labels in Data Catalog?

A

Simplify resource management; Enable fine-grained cost analysis; Step towards creating a data catalog

27
Q

What is the role of Data Catalog in data discovery?

A

Fully managed metadata management service; Provides discoverability and searchability of datasets

28
Q

What is Data Catalog’s integration with the Data Loss Prevention API?

A

It allows discovery and classification of sensitive data, aiding in data governance

29
Q

What are the advantages of using Data Catalog for metadata management?

A

Searchable metadata for datasets regardless of storage location; Grouping datasets with tags; Flagging columns with sensitive data

30
Q

What is the benefit of Data Catalog’s unified user experience?

A

Quick and easy discovery of datasets without the need to search for specific table names

31
Q

What should be considered when evaluating Dataflow and BigQuery for data quality needs?

A

Low Latency and High Throughput; Reuse of Existing Spark Pipelines; Need for Visual Pipeline Building

32
Q

What are the advantages of using Dataproc for data processing?

A

Managed service for batch processing, querying, streaming, and machine learning; Cost-effective for Hadoop workloads; Autoscaling; Integration with other Google Cloud products

33
Q

What is the purpose of Data Fusion in ETL processes?

A

Fully managed, Cloud-native enterprise data integration service; Transformation, cleanup, ensuring data consistency, populating a data warehouse

34
Q

What are the benefits of Data Fusion for non-programming role users?

A

Visual pipeline building without relying on IT team

35
Q

What are the benefits of Data Fusion for IT staff?

A

Flexible API for automated execution

36
Q

What are the important aspects to keep in mind regardless of the ETL tool used?

A

Data Lineage; Metadata and Data Catalog

37
Q

What is the significance of data lineage in ETL processes?

A

Understanding data origin, processes, and current condition; Trust, troubleshooting, regulatory compliance

38
Q

What is the purpose of metadata in ETL?

A

Discovery and identification of data suitability

39
Q

What does Data Catalog provide for data discoverability?

A

Searchable metadata and labeling

40
Q

What are the benefits of using labels in Data Catalog?

A

Organize resources; Fine-grained cost analysis; Step towards creating a data catalog

41
Q

What is Data Catalog?

A

Managed, scalable data discovery and metadata management service

42
Q

What are the features of Data Catalog?

A

No infrastructure setup or management; Enterprise-grade access control; Integration with Data Loss Prevention API

43
Q

What are the benefits of Data Catalog’s integration with Data Loss Prevention API?

A

Discover and classify sensitive data; Aid in data governance

44
Q

What can be done with Data Catalog?

A

Search metadata; Group datasets with tags; Flag columns with sensitive data

45
Q

What is the advantage of using Data Catalog for dataset discovery?

A

Unified user experience; Quick access to datasets; Eliminate the need to hunt for specific table names