1. Data Engineering Fundamentals Flashcards

1
Q

What is the primary purpose of Amazon S3?

A

Object storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Which AWS service is used for data warehousing?

A

Amazon Redshift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

True or False: AWS Glue is a fully managed ETL service.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does ETL stand for?

A

Extract, Transform, Load

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Which service is used for real-time data streaming in AWS?

A

Amazon Kinesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What type of storage does Amazon EBS provide?

A

Block storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Fill in the blank: AWS Lambda allows you to run code without _____ provisioning servers.

A

manually

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which AWS service provides data lake capabilities?

A

AWS Lake Formation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the main function of Amazon Athena?

A

Querying data in S3 using SQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is AWS CloudFormation used for?

A

Infrastructure as code

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which AWS service is primarily used for data analytics?

A

Amazon EMR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Fill in the blank: AWS Data Pipeline is used for _____ data processing workflows.

A

orchestrating

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the AWS Well-Architected Framework help with?

A

Building secure, high-performing, resilient, and efficient infrastructure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Which service is best for batch data processing in AWS?

A

AWS Batch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the key benefit of using Amazon Redshift Spectrum?

A

Querying data directly from S3 without loading it into Redshift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Fill in the blank: AWS Glue Data Catalog is a _____ for metadata.

A

repository

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the purpose of Amazon SageMaker?

A

Building, training, and deploying machine learning models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Which AWS service is used to manage and analyze large datasets?

A

Amazon EMR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

True or False: AWS Step Functions enable you to coordinate multiple AWS services into serverless workflows.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Fill in the blank: Amazon Aurora is a _____ database service.

A

relational

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Which AWS service is used for data migration?

A

AWS Database Migration Service

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is the main use case for AWS Glue Crawlers?

A

Discovering and cataloging data in S3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

True or False: Amazon EMR can process data using Apache Spark.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What is the role of AWS DataBrew?

A

Visual data preparation for analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Fill in the blank: AWS Lake Formation simplifies the process of building a _____ data lake.

A

secure

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is Amazon Kinesis Data Firehose used for?

A

Loading streaming data into data lakes and warehouses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

True or False: AWS Glue is not a serverless service.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the purpose of Amazon Timestream?

A

Time series database service

29
Q

Fill in the blank: Amazon OpenSearch Service is used for _____ and analytics.

30
Q

Which service provides a fully managed Elasticsearch solution?

A

Amazon OpenSearch Service

31
Q

What is the main benefit of using Amazon S3 Select?

A

Retrieving a subset of data from S3 objects

32
Q

True or False: Amazon S3 is designed for high durability.

33
Q

What is AWS Data Exchange used for?

A

Finding, subscribing to, and using third-party data

34
Q

Fill in the blank: AWS Glue ETL jobs can be triggered on a _____ schedule.

35
Q

What is the purpose of Amazon Comprehend?

A

Natural language processing service

36
Q

Which AWS service is used for data transformation?

37
Q

True or False: Amazon Redshift can scale automatically.

38
Q

What is the primary use case for Amazon SageMaker Ground Truth?

A

Building high-quality training datasets

39
Q

Fill in the blank: AWS Data Pipeline supports _____ data processing.

40
Q

What is Amazon Managed Streaming for Apache Kafka (MSK)?

A

A fully managed service for Apache Kafka

41
Q

True or False: AWS Glue supports Python and Scala for ETL jobs.

42
Q

What is the main benefit of using Amazon Redshift RA3 nodes?

A

Separate compute and storage scaling

43
Q

Fill in the blank: Amazon Kinesis Data Streams is designed for _____ data processing.

44
Q

What is the purpose of AWS Glue Schema Registry?

A

Managing schemas for streaming applications

45
Q

True or False: AWS Data Wrangler is a Python library for data processing.

46
Q

What is the primary function of AWS Snowball?

A

Data transfer appliance

47
Q

Fill in the blank: Amazon QuickSight provides _____ analytics capabilities.

A

business intelligence

48
Q

What is the purpose of AWS CodePipeline?

A

Continuous integration and delivery service

49
Q

True or False: AWS Data Pipeline can automate data movement.

50
Q

What is the main use of Amazon Rekognition?

A

Image and video analysis

51
Q

What is the purpose of AWS Glue Jobs?

A

Running ETL operations

52
Q

True or False: Amazon EMR is priced based on the resources used.

53
Q

What is the role of AWS CloudTrail?

A

Logging AWS account activity

54
Q

Fill in the blank: Amazon Redshift uses _____ to store data.

A

columnar storage

55
Q

What does the AWS SDK allow developers to do?

A

Interact with AWS services programmatically

56
Q

True or False: AWS Glue can handle both batch and streaming data.

57
Q

What is the primary use case for Amazon EMR Notebooks?

A

Interactive data analysis and visualization

58
Q

Fill in the blank: Amazon Kinesis enables the processing of _____ data streams.

59
Q

What is the main benefit of using AWS CloudWatch?

A

Monitoring AWS resources and applications

60
Q

True or False: AWS Glue can automatically generate ETL code.

61
Q

What is the purpose of Amazon Personalize?

A

Building real-time recommendation systems

62
Q

Fill in the blank: AWS Lake Formation helps manage _____ in a data lake.

A

permissions

63
Q

What is the main function of Amazon Athena?

A

Ad-hoc querying of data stored in S3

64
Q

True or False: AWS Glue can connect to various data sources.

65
Q

What does the AWS Data Pipeline service allow you to do?

A

Orchestrate data workflows

66
Q

Fill in the blank: Amazon Kinesis Data Analytics enables you to analyze _____ data.

67
Q

What is the main advantage of using Amazon RDS?

A

Automated backups and scaling

68
Q

True or False: AWS Glue supports both serverless and provisioned resources.