Big Data and Analytics Flashcards
Q: What is Big Data?
A: Big Data refers to datasets that are large, complex, and require specialized tools and technologies to process, analyze, and manage.
Q: What are key AWS services for Big Data and Analytics?
- Amazon S3
- Amazon Redshift
- Amazon EMR
- AWS Glue
- Amazon Athena
- Amazon Kinesis
- Amazon QuickSight
- AWS Lake Formation
- Amazon OpenSearch Service
Q: How does Amazon S3 support Big Data?
A: S3 provides scalable, durable, and low-cost object storage for storing massive datasets.
Q: What is AWS Lake Formation?
A: A service to set up, secure, and manage data lakes on S3, enabling fast data ingestion, cataloging, and governance.
Q: What is Amazon Redshift?
A: A fully managed, petabyte-scale data warehouse that enables fast analytics using SQL queries.
Q: What is Redshift Spectrum?
A: A feature that allows querying data directly from S3 without loading it into Redshift.
Q: What is Amazon Athena?
A: An interactive query service that allows you to run SQL queries on data stored in S3.
Q: What is Amazon EMR?
A: A managed service for processing big data using frameworks like Apache Hadoop, Spark, and Hive.
Q: What is Apache Spark?
A: A distributed data processing framework for fast analytics and machine learning on large datasets, available on EMR.
Q: What is AWS Glue?
A: A fully managed ETL (Extract, Transform, Load) service for preparing and transforming data.
Q: What is the AWS Glue Data Catalog?
A: A centralized metadata repository for managing data schemas and organizing data stored in S3 and other sources.
Q: What is Amazon Kinesis?
A: A service for real-time data streaming and analytics.
Q: What are Kinesis Data Streams?
A: A service for ingesting real-time streaming data and processing it with AWS services or custom applications.
Q: What is Kinesis Data Firehose?
A: A service for loading streaming data into destinations like S3, Redshift, or Elasticsearch.
Q: What is Kinesis Data Analytics?
A: A service for analyzing streaming data in real time using SQL.
Q: What is Amazon OpenSearch Service?
A: A managed service for real-time search, log analytics, and visualization of large datasets.