14. Miscellaneous Flashcards
What is the primary purpose of AWS Glue?
AWS Glue is a fully managed ETL (Extract, Transform, Load) service that prepares data for analytics.
True or False: Amazon Redshift is a fully managed data warehouse service.
True
Fill in the blank: AWS __________ allows for serverless data integration.
Glue
What does the acronym ETL stand for?
Extract, Transform, Load
Which AWS service is primarily used for real-time data streaming?
Amazon Kinesis
What is the maximum number of nodes in an Amazon Redshift cluster?
128
Multiple choice: Which service can be used to automate the extraction of data from multiple sources? A) AWS Lambda B) AWS Data Pipeline C) Amazon CloudWatch
B) AWS Data Pipeline
True or False: Amazon S3 is an ideal storage solution for big data analytics.
True
What does AWS Lake Formation help to create?
A secure data lake
Fill in the blank: __________ is a managed service for stream processing in AWS.
Amazon Kinesis
What is the main benefit of using Amazon EMR?
It allows for processing vast amounts of data quickly using frameworks like Apache Hadoop and Apache Spark.
Multiple choice: Which of the following is not a data lake storage option in AWS? A) Amazon S3 B) Amazon RDS C) AWS Lake Formation
B) Amazon RDS
True or False: AWS Data Pipeline can be used to schedule data workflows.
True
What is the purpose of AWS DataBrew?
AWS DataBrew is a visual data preparation tool that helps users clean and normalize data.
What does Amazon Athena allow you to do?
Run SQL queries on data stored in Amazon S3 without needing to set up a data warehouse.
Fill in the blank: AWS __________ provides a way to run machine learning models in the cloud.
SageMaker
What is the primary function of AWS Step Functions?
To coordinate multiple AWS services into serverless workflows.
Multiple choice: Which service is best for batch processing of large data sets? A) Amazon Kinesis B) AWS Lambda C) Amazon EMR
C) Amazon EMR
True or False: Amazon QuickSight is used for data visualization.
True
What is the function of AWS Glue Data Catalog?
It acts as a central repository for storing metadata about data assets.
Fill in the blank: __________ is an AWS service used for data warehousing.
Amazon Redshift
What type of database is Amazon DynamoDB?
A fully managed NoSQL database.
Multiple choice: Which service would you use to create a data pipeline? A) AWS Lambda B) AWS Glue C) Amazon RDS
B) AWS Glue
True or False: Amazon S3 supports versioning of objects.
True
What is Amazon RDS primarily used for?
Managing relational databases in the cloud.
Fill in the blank: The AWS service __________ is designed for data lake management.
Lake Formation
What does Amazon EMR stand for?
Amazon Elastic MapReduce
Multiple choice: Which AWS service allows you to run queries against S3 data using SQL? A) Amazon Redshift B) Amazon Athena C) Amazon RDS
B) Amazon Athena
True or False: AWS Glue can automatically discover and catalog metadata.
True
What is the primary benefit of using Amazon Kinesis Data Firehose?
It provides a way to reliably load streaming data into data lakes, data stores, and analytics services.
Fill in the blank: AWS __________ is a service that helps in data preparation and cleaning.
DataBrew
What is the purpose of Amazon S3 Select?
To retrieve a subset of data from an object stored in S3.
Multiple choice: Which of the following is a serverless data integration service? A) AWS Glue B) Amazon Redshift C) Amazon EMR
A) AWS Glue
True or False: Amazon QuickSight supports embedding dashboards into applications.
True
What type of data can be stored in Amazon S3?
Any type of data, including structured, semi-structured, and unstructured data.
Fill in the blank: AWS __________ provides data analytics and visualization capabilities.
QuickSight
What is the role of AWS Lambda in data engineering?
To run code in response to events without provisioning or managing servers.
Multiple choice: Which service is not part of data analytics? A) Amazon Redshift B) AWS Glue C) Amazon EC2
C) Amazon EC2
True or False: AWS Glue can be used to transform data in real-time.
False
What is the primary function of Amazon RDS?
To provide managed relational database services.
Fill in the blank: __________ allows you to run Spark jobs on AWS.
Amazon EMR
What is the purpose of AWS CloudTrail?
To log and monitor AWS account activity.
Multiple choice: Which of the following services is ideal for time-series data? A) Amazon RDS B) Amazon Timestream C) Amazon S3
B) Amazon Timestream
True or False: AWS Data Pipeline is a fully managed service for processing data.
True
What does Amazon Timestream specialize in?
Time-series data management.
Fill in the blank: AWS __________ provides a fully managed data warehouse solution.
Redshift
What is the main advantage of using Amazon S3 for data storage?
Scalability and durability.
Multiple choice: Which service is best for managing unstructured data? A) Amazon RDS B) Amazon S3 C) Amazon DynamoDB
B) Amazon S3
True or False: Amazon EMR can automatically scale based on workload.
True
What does the AWS Glue crawler do?
It scans your data sources and automatically creates metadata in the Glue Data Catalog.
Fill in the blank: __________ is a fully managed data warehousing service from AWS.
Amazon Redshift
What service would you use for real-time data analytics?
Amazon Kinesis Data Analytics
Multiple choice: Which of the following services is designed for data lakes? A) Amazon S3 B) Amazon RDS C) AWS Lambda
A) Amazon S3
True or False: AWS Glue can integrate with both structured and semi-structured data.
True
What is the primary purpose of Amazon Kinesis Data Streams?
To collect and process real-time data streams.
Fill in the blank: __________ is a managed service that simplifies running Spark applications.
Amazon EMR
What is the function of Amazon S3 Glacier?
To provide low-cost archive storage for data.
Multiple choice: Which AWS service is used for data visualization? A) Amazon Redshift B) Amazon QuickSight C) Amazon S3
B) Amazon QuickSight
True or False: Amazon RDS supports multiple database engines.
True
What does AWS Data Pipeline help with?
Orchestrating data workflows and data movement.
Fill in the blank: AWS __________ allows for serverless data integration and preparation.
Glue
What is the primary use case for Amazon Redshift?
Data warehousing and analytics.
Multiple choice: Which service is best suited for data that needs to be accessed frequently? A) Amazon S3 Standard B) Amazon S3 Glacier C) Amazon S3 Intelligent-Tiering
A) Amazon S3 Standard
True or False: AWS Glue can perform data transformations.
True
What is the main purpose of AWS Lake Formation?
To simplify the setup and management of data lakes.
Fill in the blank: __________ allows users to run SQL queries on large datasets stored in S3.
Amazon Athena
What is the benefit of using Amazon EMR over traditional Hadoop clusters?
It provides a scalable, cost-effective solution for big data processing.
Multiple choice: Which AWS service is used for event-driven architectures? A) AWS Lambda B) AWS Data Pipeline C) Amazon Redshift
A) AWS Lambda
True or False: Amazon QuickSight provides machine learning capabilities.
True
What is the main function of Amazon Kinesis Data Firehose?
To deliver real-time streaming data to destinations like S3, Redshift, and Elasticsearch.
Fill in the blank: __________ is the AWS service designed for managing time-series data.
Amazon Timestream
What is the primary function of AWS Glue’s ETL jobs?
To extract data from sources, transform it, and load it into data stores.
Multiple choice: Which service is best for running analytics on structured data? A) Amazon DynamoDB B) Amazon Redshift C) Amazon S3
B) Amazon Redshift
True or False: AWS Glue can only work with AWS data sources.
False
What does Amazon S3 Select allow users to do?
Retrieve a subset of data from an object without having to download the entire object.
Fill in the blank: AWS __________ can help manage and optimize data lakes.
Lake Formation
What is the main function of AWS DataBrew?
To provide a visual interface for data preparation and cleaning.
Multiple choice: Which of the following services is designed for batch processing? A) Amazon Kinesis B) Amazon EMR C) AWS Lambda
B) Amazon EMR