Analytics - AWS Services for Veracity Flashcards

1
Q

Organisations need to ensure the integrity of their data at all phases of the data lifecylce. They must have accurate data as it enters their system by going through a data cleansing process.

Is what type of challenge

a) Volume
b) Velocity
c) Veracity

A

c) Veracity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

_________________ streamlines collection and processing data for big data workloads.

Amazon EMR
AWS Glue
AWS Glue DataBrew
Amazon DataZone
A

EMR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

_____________ prepare and integrate all your data at any scale.

Amazon EMR
AWS Glue
AWS Glue DataBrew
Amazon DataZone
A

AWS Glue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

______________ clean and normalize data faster and more efficiently.

Amazon EMR
AWS Glue
AWS Glue DataBrew
Amazon DataZone
A

Glue DataBrew

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

______________ share data across your organisation with built in governance.

Amazon EMR
AWS Glue
AWS Glue DataBrew
Amazon DataZone
A

DataBrew

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Serverless discovery and definition of table definitions and schema.

Amazon EMR
AWS Glue
AWS Glue DataBrew
Amazon DataZone
A

Glue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Central metadata repo for your lake. It will discover schemas out of unsutrcutured data sitting in S3 etc and publish table definition for use with analysis tools like EMR and Athena, Redshift

Amazon EMR
AWS Glue
AWS Glue DataBrew
Amazon DataZone
A

Glue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Service that has custom etls jobs where it can discover the scehma for you. Which can have a trigger based on when data is recevied or on a schedule or on demand.

Amazon EMR
AWS Glue
AWS Glue DataBrew
Amazon DataZone

A

AWS Glue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

AWS service which Uses apache saprk for distrubuted data processing. With ________ etl , you dont need to worrya bout managing the spark cluster.

Amazon EMR
AWS Glue
AWS Glue DataBrew
Amazon DataZone
A

AWS Glue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

_________ crawler scan data in s3 and creates schema.

However somtiem you need to give this hints. Peridocailly or on dmeand.

_____________ crawler populates the glue data catalog where it sores only table definiiton.

Once catalogued, you can treat your unstructured data like its structured. WHere it allows things like Redshift, athena or systems running an EMR like Hive to query your unstructured data in S3.

Amazon EMR
AWS Glue
AWS Glue DataBrew
Amazon DataZone
A

Amazon Glue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

________________ crawler will extract partitions of your data based on how your S3 data is organised.

Amazon EMR
AWS Glue
AWS Glue DataBrew
Amazon DataZone
A

AWS glue

You want to think how you are going to query your data lake in S3.

i.e time ranges - organise buckets for year, month, device etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly