AWS Data Pipeline Flashcards

Question 1

Q

Is Datapipeline a serverless product?

Answer

A

Yes. data pipe requires no infrastructire to be deployed.

Question 2

Q

For Datapipeline, What are data nodes?

Answer

A

Data nodes are the names and locations and formats of your data, like S3, DdesynamoDB, Redshift. You can think of them as the reader module that reads in the data to the pipeline.

Question 3

Q

For Data pipeline, what inputs can I use?

Answer

A

Redshift
RDS
DynamoDB
S3

Question 4

Q

For Datapipeline, What are the schedules?

Answer

A

It is part of a pipeline definition and defines when the definition is to be executed. This can be bassed on pipeline activation or based time.

Question 5

Q

For Datapipeline, What are the actions?

Answer

A

This would be is the data available

Question 6

Q

When you run your Data pipeline is it realtime?

Answer

A

No, things are scheduled.

Question 7

Q

For Datapipeline, What are the resources

Answer

A

These are EC2 instances and EMR clusters.

Question 8

Q

For Datapipeline, What are the resources

Answer

A

These are EC2 instances and EMR clusters.

Question 9

Q

Datapipeline, What are actions?

Answer

A

these are ways you can receive updated on the For Datapipeline, status.

Question 10

Q

For Datapipeline, What are the actions?

Answer

A

these are ways you can receive updated on the For Datapipeline, status.

Question 11

Q

For Datapipeline, can i process data in real time?

Answer

A

No, Datapipeline is about processing data but not in a real-time fashion.

Question 12

Q

What cna I do with Datapipelne?

Answer

A

Export data from DynamoDB to s3
Import data from s3 to DynamoDB
RDS MySQL to s3
RDS incramental copy to s3
RDS s3 to RDS Table
RDS Table to Redshift
RDS incremetal to Redshift
Load data from Redshift to RDS

Question 13

Q

I what to stream live data from RDS to Redshift, how cna I use Datapipeline?

Answer

A

You can not do it with Datapipeline, Datapipeline is only used on a schedule and not for streaming live.

Question 14

Q

I need to batch shift and transform data from RDS to S3, how cna I do this?

Answer

A

You can use Datapipeline

Question 15

Q

What is pre-condition?

Answer

A

You check for the existence of your data,

S3 files or directories exist
DynamoDB table exists
RDS queries
Redshift queries

Question 16

Q

I wnat to get an SNS message when a Data pipeline activity fails, how cna I achieve this?

Answer

A

You cna set SNS to send messages from DP activities fail.

Question 17

Q

What is a Datapipeline activity?

Answer

A

It is a something thet works on your data,

EMR Job
Pig
Hive
Hadoop
Shell command
SQL

Question 18

Q

What is the schedule?

Answer

A

It is when the Datapipeline should run, it is part of the Datapipeline definition, and you cna define parameters like,
- Start time
- Interval
- End
Example: Start today at noon and run for 1hr.

Question 19

Q

How do I set up a cluster for datapipeline to run?

Answer

A

You do not set up a cluster, Datapipeline is a serverless product.

Question 20

Q

What is AWS datapipeline?

Answer

A

automate and schedule regular data movement and data processing activities in AWS

Question 21

Q

Where can the data sources be located?

Answer

A

on-prem

- AWS

Question 22

Q

Can I set up a schedule for the datapipeline to run?

Answer

A

Yes as part of the datapipeline definition you can set up a schedule to run, this could be like every hour or a data time, but there are lots of options.

Question 23

Q

When would I use datapipeline vs AWS Glue?