Glue Flashcards

Question 1

Q

Which AWS serverless service offers fully managed Extract-Transform-Load (ETL) capabilities, enabling users to prepare and load data for analytics?

Question 2

Q

What is the primary purpose of AWS Glue?

Answer

A

To facilitate data movement and transformation between sources and destinations

Question 3

Q

What types of data sources does AWS Glue support?

Answer

A

Data stores: S3, RDS, JDBC-compatible databases, and DynamoDB
Data streams: Kinesis Data Streams and Apache Kafka

Question 4

Q

What destinations can AWS Glue write to?

Answer

A

S3, RDS, and JDBC-compatible databases

Question 5

Q

How does AWS Glue deliver its ETL functionality?

Answer

A

By using Glue Jobs, which leverage the Glue Data Catalog, data is extracted from sources, transformed via scripts, and loaded into destinations

Question 6

Q

What AWS Glue component serves as a metadata repository combined with tools for data management and search?

Answer

A

Glue Data Catalog

Question 7

Q

How does the Glue Data Catalog help prevent data silos?

Answer

A

By providing a unified Data Catalog for each region within an AWS account

Question 8

Q

Which AWS services integrate with the Glue Data Catalog?

Answer

A

Athena
Redshift Spectrum
EMR
Lake Formation

Question 9

Q

How does Glue Data Catalog discover data?

Answer

A

By using crawlers configured with the necessary credentials

Question 10

Q

What resources does a Glue Job use?

Answer

A

A pool of managed (warm) resources

Question 11

Q

How can a Glue Job be triggered?

Answer

A

Started manually
Scheduled using EventBridge
Triggered by events from other sources

Question 12

Q

What AWS service is ideal for a serverless, ad hoc, and cost-effective ETL solution?

Question 13

Q

Which service is utilized by AWS Data Pipeline for processing?

Answer

A

Elastic MapReduce (EMR)

Glue Flashcards

This deck aims to help retain concepts related to the Glue service. (13 cards)