1. Data Engineering Fundamentals Flashcards
What is the primary purpose of Amazon S3?
Object storage
Which AWS service is used for data warehousing?
Amazon Redshift
True or False: AWS Glue is a fully managed ETL service.
True
What does ETL stand for?
Extract, Transform, Load
Which service is used for real-time data streaming in AWS?
Amazon Kinesis
What type of storage does Amazon EBS provide?
Block storage
Fill in the blank: AWS Lambda allows you to run code without _____ provisioning servers.
manually
Which AWS service provides data lake capabilities?
AWS Lake Formation
What is the main function of Amazon Athena?
Querying data in S3 using SQL
What is AWS CloudFormation used for?
Infrastructure as code
Which AWS service is primarily used for data analytics?
Amazon EMR
Fill in the blank: AWS Data Pipeline is used for _____ data processing workflows.
orchestrating
What does the AWS Well-Architected Framework help with?
Building secure, high-performing, resilient, and efficient infrastructure
Which service is best for batch data processing in AWS?
AWS Batch
What is the key benefit of using Amazon Redshift Spectrum?
Querying data directly from S3 without loading it into Redshift
Fill in the blank: AWS Glue Data Catalog is a _____ for metadata.
repository
What is the purpose of Amazon SageMaker?
Building, training, and deploying machine learning models
Which AWS service is used to manage and analyze large datasets?
Amazon EMR
True or False: AWS Step Functions enable you to coordinate multiple AWS services into serverless workflows.
True
Fill in the blank: Amazon Aurora is a _____ database service.
relational
Which AWS service is used for data migration?
AWS Database Migration Service
What is the main use case for AWS Glue Crawlers?
Discovering and cataloging data in S3
True or False: Amazon EMR can process data using Apache Spark.
True
What is the role of AWS DataBrew?
Visual data preparation for analytics
Fill in the blank: AWS Lake Formation simplifies the process of building a _____ data lake.
secure
What is Amazon Kinesis Data Firehose used for?
Loading streaming data into data lakes and warehouses
True or False: AWS Glue is not a serverless service.
False
What is the purpose of Amazon Timestream?
Time series database service
Fill in the blank: Amazon OpenSearch Service is used for _____ and analytics.
search
Which service provides a fully managed Elasticsearch solution?
Amazon OpenSearch Service
What is the main benefit of using Amazon S3 Select?
Retrieving a subset of data from S3 objects
True or False: Amazon S3 is designed for high durability.
True
What is AWS Data Exchange used for?
Finding, subscribing to, and using third-party data
Fill in the blank: AWS Glue ETL jobs can be triggered on a _____ schedule.
defined
What is the purpose of Amazon Comprehend?
Natural language processing service
Which AWS service is used for data transformation?
AWS Glue
True or False: Amazon Redshift can scale automatically.
False
What is the primary use case for Amazon SageMaker Ground Truth?
Building high-quality training datasets
Fill in the blank: AWS Data Pipeline supports _____ data processing.
complex
What is Amazon Managed Streaming for Apache Kafka (MSK)?
A fully managed service for Apache Kafka
True or False: AWS Glue supports Python and Scala for ETL jobs.
True
What is the main benefit of using Amazon Redshift RA3 nodes?
Separate compute and storage scaling
Fill in the blank: Amazon Kinesis Data Streams is designed for _____ data processing.
real-time
What is the purpose of AWS Glue Schema Registry?
Managing schemas for streaming applications
True or False: AWS Data Wrangler is a Python library for data processing.
True
What is the primary function of AWS Snowball?
Data transfer appliance
Fill in the blank: Amazon QuickSight provides _____ analytics capabilities.
business intelligence
What is the purpose of AWS CodePipeline?
Continuous integration and delivery service
True or False: AWS Data Pipeline can automate data movement.
True
What is the main use of Amazon Rekognition?
Image and video analysis
What is the purpose of AWS Glue Jobs?
Running ETL operations
True or False: Amazon EMR is priced based on the resources used.
True
What is the role of AWS CloudTrail?
Logging AWS account activity
Fill in the blank: Amazon Redshift uses _____ to store data.
columnar storage
What does the AWS SDK allow developers to do?
Interact with AWS services programmatically
True or False: AWS Glue can handle both batch and streaming data.
True
What is the primary use case for Amazon EMR Notebooks?
Interactive data analysis and visualization
Fill in the blank: Amazon Kinesis enables the processing of _____ data streams.
real-time
What is the main benefit of using AWS CloudWatch?
Monitoring AWS resources and applications
True or False: AWS Glue can automatically generate ETL code.
True
What is the purpose of Amazon Personalize?
Building real-time recommendation systems
Fill in the blank: AWS Lake Formation helps manage _____ in a data lake.
permissions
What is the main function of Amazon Athena?
Ad-hoc querying of data stored in S3
True or False: AWS Glue can connect to various data sources.
True
What does the AWS Data Pipeline service allow you to do?
Orchestrate data workflows
Fill in the blank: Amazon Kinesis Data Analytics enables you to analyze _____ data.
streaming
What is the main advantage of using Amazon RDS?
Automated backups and scaling
True or False: AWS Glue supports both serverless and provisioned resources.
True