Glue Flashcards
This deck aims to help retain concepts related to the Glue service.
Which AWS serverless service offers fully managed Extract-Transform-Load (ETL) capabilities, enabling users to prepare and load data for analytics?
AWS Glue
What is the primary purpose of AWS Glue?
To facilitate data movement and transformation between sources and destinations
What types of data sources does AWS Glue support?
- Data stores: S3, RDS, JDBC-compatible databases, and DynamoDB
- Data streams: Kinesis Data Streams and Apache Kafka
What destinations can AWS Glue write to?
S3, RDS, and JDBC-compatible databases
How does AWS Glue deliver its ETL functionality?
By using Glue Jobs, which leverage the Glue Data Catalog, data is extracted from sources, transformed via scripts, and loaded into destinations
What AWS Glue component serves as a metadata repository combined with tools for data management and search?
Glue Data Catalog
How does the Glue Data Catalog help prevent data silos?
By providing a unified Data Catalog for each region within an AWS account
Which AWS services integrate with the Glue Data Catalog?
Athena, Redshift Spectrum, EMR, and AWS Lake Formation
How does Glue Data Catalog discover data?
By using crawlers configured with the necessary credentials
What resources does a Glue Job use?
A pool of managed (warm) resources
How can a Glue Job be triggered?
It can be started manually, scheduled using EventBridge, or triggered by events from other sources
What AWS service is ideal for a serverless, ad hoc, and cost-effective ETL solution?
AWS Glue
Which service is utilized by AWS Data Pipeline for processing?
Elastic MapReduce (EMR)