AWS Data Pipeline Flashcards
Is Datapipeline a serverless product?
Yes. data pipe requires no infrastructire to be deployed.
For Datapipeline, What are data nodes?
Data nodes are the names and locations and formats of your data, like S3, DdesynamoDB, Redshift. You can think of them as the reader module that reads in the data to the pipeline.
For Data pipeline, what inputs can I use?
Redshift
RDS
DynamoDB
S3
For Datapipeline, What are the schedules?
It is part of a pipeline definition and defines when the definition is to be executed. This can be bassed on pipeline activation or based time.
For Datapipeline, What are the actions?
This would be is the data available
When you run your Data pipeline is it realtime?
No, things are scheduled.
For Datapipeline, What are the resources
These are EC2 instances and EMR clusters.
For Datapipeline, What are the resources
These are EC2 instances and EMR clusters.
Datapipeline, What are actions?
these are ways you can receive updated on the For Datapipeline, status.
For Datapipeline, What are the actions?
these are ways you can receive updated on the For Datapipeline, status.
For Datapipeline, can i process data in real time?
No, Datapipeline is about processing data but not in a real-time fashion.
What cna I do with Datapipelne?
- Export data from DynamoDB to s3
- Import data from s3 to DynamoDB
- RDS MySQL to s3
- RDS incramental copy to s3
- RDS s3 to RDS Table
- RDS Table to Redshift
- RDS incremetal to Redshift
- Load data from Redshift to RDS
I what to stream live data from RDS to Redshift, how cna I use Datapipeline?
You can not do it with Datapipeline, Datapipeline is only used on a schedule and not for streaming live.
I need to batch shift and transform data from RDS to S3, how cna I do this?
You can use Datapipeline
What is pre-condition?
You check for the existence of your data,
- S3 files or directories exist
- DynamoDB table exists
- RDS queries
- Redshift queries