[2] AWS Data Ecosystem Flashcards

1
Q

How is security managed with S3?

A
  • Encryption - S3-SSE and S3-KMS
  • IAM
  • Bucket policies
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Data Pipeline?

A

A managed service to create highly-available data workflows that move data between services

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is EMR?

A

A managed service for hosting massively parallel compute tasks with Spark

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How are EMR deployments structured?

A

There is a master node which is running all of the time,. core nodes which coordinate the data storage etc., and task nodes which do the actual computation (these can be spot instances)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How can machine learning on EC2 be streamlined?

A

With the Deep Learning AMIs which bundle key libraries and drivers etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is AWS Batch?

A

A service for processing a large amount of data in parallel i.e. batch inference

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What features does Glue have?

A
  • crawlers to create catalogues of the data
  • managed ETL using Python or Scala
  • some ML capabilities such as deduplicating records
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What import and export locations does Glue support?

A

It can read data from DynamoDB, S3 and services supporting JBDC

Results can be saved to a database or S3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the steps to using Glue?

A

(1) build a data catalogue using crawlers
(2) define transformations
(3) schedule and run transformations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Athena?

A

A managed service to perform SQL-like queries on data.

The data must be in S3 and the results are stored in S3

Data can be sourced from multiple S3 locations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the Kinesis services?

A

Kinesis Video Streams

Kinesis Data Streams

Kinesis Data Firehose

Kinesis Data Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Kinesis Video Streams?

A

A service which streams video from devices to AWS for analytics, machine learning, playback and encoding etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is Kinesis Data Streams?

A

Provides an endpoint for pressing data in real time for Kinesis Data Analytics, Spark on EMR, EC2 or Lambda etc.

Static reference data can be sourced from S3. Only one stream can be ingested at a time (same as Firehose)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is Kinesis Data Firehose?

A

Data is streamed IN BATCHES in to S3, Redshift, Elasticsearch or Splunk etc. for offline processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Kinesis Data Analytics?

A

Streaming data from Kinesis Streams or Firehose is processed in real time using SQL or Java libraries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is AWS Machine Learning?

A

A deprecated service that was the predecessor to SageMaker