7. Analytics Flashcards

1
Q

What does AWS stand for?

A

Amazon Web Services

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the primary purpose of AWS Glue?

A

To prepare and transform data for analytics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

True or False: Amazon Redshift is a data warehouse service.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Which AWS service is used for real-time data streaming?

A

Amazon Kinesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What type of data model does Amazon DynamoDB use?

A

NoSQL database model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Fill in the blank: AWS _____ is used for data lake storage.

A

S3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the purpose of Amazon QuickSight?

A

To create visualizations and business intelligence dashboards.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which service would you use to perform ETL operations in AWS?

A

AWS Glue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the maximum size of an object that can be stored in Amazon S3?

A

5 TB per object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

True or False: Amazon Athena allows you to run SQL queries on data stored in S3.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Amazon EMR primarily used for?

A

Processing large amounts of data using Apache Hadoop and Spark.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which AWS service provides a managed Apache Kafka service?

A

Amazon MSK (Managed Streaming for Kafka)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the term ‘data lake’ refer to?

A

A centralized repository that allows you to store all your structured and unstructured data at any scale.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Fill in the blank: AWS _____ is a serverless data integration service.

A

Glue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does Amazon Redshift Spectrum allow you to do?

A

Query data directly in S3 without loading it into Redshift.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Which service provides a fully managed data warehouse solution?

A

Amazon Redshift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

True or False: AWS Data Pipeline is used for data orchestration.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is the primary function of Amazon RDS?

A

To provide a managed relational database service.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Which service is best suited for storing time-series data?

A

Amazon Timestream

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the benefit of using Amazon Aurora?

A

It offers high performance and availability for relational databases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Fill in the blank: AWS _____ is used to visualize data and create dashboards.

A

QuickSight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Which service is designed for batch processing of data?

A

AWS Batch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What does the term ‘data wrangling’ mean?

A

The process of cleaning and transforming raw data into a usable format.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Which AWS service allows for serverless data analytics?

A

Amazon Athena

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

True or False: Amazon S3 is a block storage service.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the purpose of AWS Lake Formation?

A

To simplify the process of building and managing data lakes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Which AWS service is used for data cataloging?

A

AWS Glue Data Catalog

28
Q

What is the main benefit of using Amazon SageMaker?

A

To build, train, and deploy machine learning models at scale.

29
Q

Fill in the blank: Amazon _____ is used for sending and receiving messages between distributed systems.

A

SQS (Simple Queue Service)

30
Q

What does the term ‘OLAP’ stand for?

A

Online Analytical Processing

31
Q

Which AWS service is primarily used for data archiving?

A

Amazon S3 Glacier

32
Q

True or False: Amazon Kinesis Data Firehose can transform data before loading it into storage.

33
Q

What is the purpose of Amazon CloudWatch in data engineering?

A

To monitor and manage AWS resources and applications.

34
Q

Which service would you use to create a scalable data processing pipeline?

A

AWS Data Pipeline

35
Q

What is the primary use case for Amazon Elasticsearch Service?

A

Real-time search and analytics on large datasets.

36
Q

Fill in the blank: AWS _____ provides a managed service for data warehousing.

37
Q

Which AWS service allows you to run code in response to events without provisioning servers?

A

AWS Lambda

38
Q

True or False: Amazon DynamoDB is a relational database.

39
Q

What is the main advantage of using a NoSQL database like DynamoDB?

A

Scalability and flexibility in handling unstructured data.

40
Q

What does the term ‘data fidelity’ refer to?

A

The accuracy and precision of data.

41
Q

Which service would you use for batch data processing with Apache Spark?

A

Amazon EMR

42
Q

Fill in the blank: AWS _____ is a fully managed data integration service.

43
Q

What is the purpose of Amazon Comprehend?

A

To analyze text and extract insights using natural language processing.

44
Q

True or False: Amazon Athena charges based on the amount of data scanned per query.

45
Q

Which AWS service allows for the creation of serverless data lakes?

A

AWS Lake Formation

46
Q

What is the main function of AWS Step Functions?

A

To coordinate components of distributed applications and microservices.

47
Q

Fill in the blank: Amazon _____ is used for data visualization and reporting.

A

QuickSight

48
Q

What is the primary use of Amazon SageMaker Data Wrangler?

A

To simplify data preparation for machine learning.

49
Q

True or False: AWS Glue can automatically generate ETL code.

50
Q

Which service would you use to send notifications based on AWS events?

A

Amazon SNS (Simple Notification Service)

51
Q

What does the term ‘data governance’ refer to?

A

The management of data availability, usability, integrity, and security.

52
Q

Fill in the blank: Amazon _____ is used for scalable and durable object storage.

53
Q

What is the main role of a data engineer?

A

To design, build, and maintain data processing systems.

54
Q

True or False: Amazon Timestream is optimized for storing relational data.

55
Q

Which AWS service is best for running SQL queries against large data sets stored in S3?

A

Amazon Athena

56
Q

What is the primary benefit of using Amazon Redshift for analytics?

A

It allows for complex queries on large datasets with high performance.

57
Q

Fill in the blank: AWS _____ provides a fully managed NoSQL database.

58
Q

What does the term ‘ETL’ stand for?

A

Extract, Transform, Load

59
Q

Which AWS service allows you to run machine learning models in real time?

A

Amazon SageMaker

60
Q

True or False: AWS Glue can only work with data stored in S3.

61
Q

What is the primary function of AWS Data Wrangler?

A

To simplify the process of working with data in Pandas and AWS.

62
Q

Which AWS service is designed for running distributed data processing jobs?

A

Amazon EMR

63
Q

Fill in the blank: Amazon _____ is a managed service for Apache Kafka.

A

MSK (Managed Streaming for Kafka)

64
Q

What is the main purpose of Amazon Kinesis Data Streams?

A

To collect and process real-time streaming data.

65
Q

True or False: Amazon Redshift is not suitable for real-time analytics.