7. Analytics Flashcards
What does AWS stand for?
Amazon Web Services
What is the primary purpose of AWS Glue?
To prepare and transform data for analytics.
True or False: Amazon Redshift is a data warehouse service.
True
Which AWS service is used for real-time data streaming?
Amazon Kinesis
What type of data model does Amazon DynamoDB use?
NoSQL database model
Fill in the blank: AWS _____ is used for data lake storage.
S3
What is the purpose of Amazon QuickSight?
To create visualizations and business intelligence dashboards.
Which service would you use to perform ETL operations in AWS?
AWS Glue
What is the maximum size of an object that can be stored in Amazon S3?
5 TB per object
True or False: Amazon Athena allows you to run SQL queries on data stored in S3.
True
What is Amazon EMR primarily used for?
Processing large amounts of data using Apache Hadoop and Spark.
Which AWS service provides a managed Apache Kafka service?
Amazon MSK (Managed Streaming for Kafka)
What does the term ‘data lake’ refer to?
A centralized repository that allows you to store all your structured and unstructured data at any scale.
Fill in the blank: AWS _____ is a serverless data integration service.
Glue
What does Amazon Redshift Spectrum allow you to do?
Query data directly in S3 without loading it into Redshift.
Which service provides a fully managed data warehouse solution?
Amazon Redshift
True or False: AWS Data Pipeline is used for data orchestration.
True
What is the primary function of Amazon RDS?
To provide a managed relational database service.
Which service is best suited for storing time-series data?
Amazon Timestream
What is the benefit of using Amazon Aurora?
It offers high performance and availability for relational databases.
Fill in the blank: AWS _____ is used to visualize data and create dashboards.
QuickSight
Which service is designed for batch processing of data?
AWS Batch
What does the term ‘data wrangling’ mean?
The process of cleaning and transforming raw data into a usable format.
Which AWS service allows for serverless data analytics?
Amazon Athena
True or False: Amazon S3 is a block storage service.
False
What is the purpose of AWS Lake Formation?
To simplify the process of building and managing data lakes.
Which AWS service is used for data cataloging?
AWS Glue Data Catalog
What is the main benefit of using Amazon SageMaker?
To build, train, and deploy machine learning models at scale.
Fill in the blank: Amazon _____ is used for sending and receiving messages between distributed systems.
SQS (Simple Queue Service)
What does the term ‘OLAP’ stand for?
Online Analytical Processing
Which AWS service is primarily used for data archiving?
Amazon S3 Glacier
True or False: Amazon Kinesis Data Firehose can transform data before loading it into storage.
True
What is the purpose of Amazon CloudWatch in data engineering?
To monitor and manage AWS resources and applications.
Which service would you use to create a scalable data processing pipeline?
AWS Data Pipeline
What is the primary use case for Amazon Elasticsearch Service?
Real-time search and analytics on large datasets.
Fill in the blank: AWS _____ provides a managed service for data warehousing.
Redshift
Which AWS service allows you to run code in response to events without provisioning servers?
AWS Lambda
True or False: Amazon DynamoDB is a relational database.
False
What is the main advantage of using a NoSQL database like DynamoDB?
Scalability and flexibility in handling unstructured data.
What does the term ‘data fidelity’ refer to?
The accuracy and precision of data.
Which service would you use for batch data processing with Apache Spark?
Amazon EMR
Fill in the blank: AWS _____ is a fully managed data integration service.
Glue
What is the purpose of Amazon Comprehend?
To analyze text and extract insights using natural language processing.
True or False: Amazon Athena charges based on the amount of data scanned per query.
True
Which AWS service allows for the creation of serverless data lakes?
AWS Lake Formation
What is the main function of AWS Step Functions?
To coordinate components of distributed applications and microservices.
Fill in the blank: Amazon _____ is used for data visualization and reporting.
QuickSight
What is the primary use of Amazon SageMaker Data Wrangler?
To simplify data preparation for machine learning.
True or False: AWS Glue can automatically generate ETL code.
True
Which service would you use to send notifications based on AWS events?
Amazon SNS (Simple Notification Service)
What does the term ‘data governance’ refer to?
The management of data availability, usability, integrity, and security.
Fill in the blank: Amazon _____ is used for scalable and durable object storage.
S3
What is the main role of a data engineer?
To design, build, and maintain data processing systems.
True or False: Amazon Timestream is optimized for storing relational data.
False
Which AWS service is best for running SQL queries against large data sets stored in S3?
Amazon Athena
What is the primary benefit of using Amazon Redshift for analytics?
It allows for complex queries on large datasets with high performance.
Fill in the blank: AWS _____ provides a fully managed NoSQL database.
DynamoDB
What does the term ‘ETL’ stand for?
Extract, Transform, Load
Which AWS service allows you to run machine learning models in real time?
Amazon SageMaker
True or False: AWS Glue can only work with data stored in S3.
False
What is the primary function of AWS Data Wrangler?
To simplify the process of working with data in Pandas and AWS.
Which AWS service is designed for running distributed data processing jobs?
Amazon EMR
Fill in the blank: Amazon _____ is a managed service for Apache Kafka.
MSK (Managed Streaming for Kafka)
What is the main purpose of Amazon Kinesis Data Streams?
To collect and process real-time streaming data.
True or False: Amazon Redshift is not suitable for real-time analytics.
True