Analytics - AWS Services for Veracity Flashcards

Question 1

Q

Organisations need to ensure the integrity of their data at all phases of the data lifecylce. They must have accurate data as it enters their system by going through a data cleansing process.

Is what type of challenge

a) Volume
b) Velocity
c) Veracity

Answer

A

c) Veracity

Question 2

Q

_________________ streamlines collection and processing data for big data workloads.

Amazon EMR
AWS Glue
AWS Glue DataBrew
Amazon DataZone

Question 3

Q

_____________ prepare and integrate all your data at any scale.

Amazon EMR
AWS Glue
AWS Glue DataBrew
Amazon DataZone

Question 4

Q

______________ clean and normalize data faster and more efficiently.

Amazon EMR
AWS Glue
AWS Glue DataBrew
Amazon DataZone

Answer

A

Glue DataBrew

Question 5

Q

______________ share data across your organisation with built in governance.

Amazon EMR
AWS Glue
AWS Glue DataBrew
Amazon DataZone

Question 6

Q

Serverless discovery and definition of table definitions and schema.

Amazon EMR
AWS Glue
AWS Glue DataBrew
Amazon DataZone

Question 7

Q

Central metadata repo for your lake. It will discover schemas out of unsutrcutured data sitting in S3 etc and publish table definition for use with analysis tools like EMR and Athena, Redshift

Amazon EMR
AWS Glue
AWS Glue DataBrew
Amazon DataZone

Question 8

Q

Service that has custom etls jobs where it can discover the scehma for you. Which can have a trigger based on when data is recevied or on a schedule or on demand.

Amazon EMR
AWS Glue
AWS Glue DataBrew
Amazon DataZone

Question 9

Q

AWS service which Uses apache saprk for distrubuted data processing. With ________ etl , you dont need to worrya bout managing the spark cluster.

Amazon EMR
AWS Glue
AWS Glue DataBrew
Amazon DataZone

Question 10

Q

_________ crawler scan data in s3 and creates schema.

However somtiem you need to give this hints. Peridocailly or on dmeand.

_____________ crawler populates the glue data catalog where it sores only table definiiton.

Once catalogued, you can treat your unstructured data like its structured. WHere it allows things like Redshift, athena or systems running an EMR like Hive to query your unstructured data in S3.

Amazon EMR
AWS Glue
AWS Glue DataBrew
Amazon DataZone

Answer

A

Amazon Glue

Question 11

Q

________________ crawler will extract partitions of your data based on how your S3 data is organised.

Amazon EMR
AWS Glue
AWS Glue DataBrew
Amazon DataZone

Answer

A

AWS glue

You want to think how you are going to query your data lake in S3.

i.e time ranges - organise buckets for year, month, device etc.

Analytics - AWS Services for Veracity Flashcards

(11 cards)