Data and Analytics Flashcards

1
Q

What is Amazon Athena?

A

A server-less query service used to analyse data stored in Amazon S3 with SQL queries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are federated queries?

A

Queries that can be run across multiple data sources than just what is in S3, such as relational, non-relational, object and custom data sources

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is 1 method that can be used to increase the performance of Athena?

A

Partitioning
Using columnar data
Use larger files as these are easier to scan and retrieve for Athena

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Amazon Redshift used for?

A

Data warehousing and analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Is Redshift columnar or row-based?

A

Columnar

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What engine is Redshift based on?

A

PostgreSQL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the two snapshot modes of Redshift and what are the differences?

A

Automated and manual.
With automated, the snapshot is retained for a period that the user sets, whereas with manual the snapshot is kept until it is deleted.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the two node types within a Redshift cluster?

A

Leader and compute

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Redshift Spectrum?

A

A service that allows the user to query data that is already in S3 without having to load it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the principal benefit of Redshift spectrum?

A

It allows the user to leverage a lot more computing power than they actually have provisioned and for the avoidance of having to actually load the S3 data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is OpenSearch?

A

A service that allows the user to search any field, including partial matches, of a database

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is EMR?

A

Elastic Map Reduce - a service that allows the user to create Hadoop clusters for big data analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How does EMR scale?

A

Automatically, through the provisioning of additional clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the node types within an EMR cluster?

A

Master, core and task.
Master nodes manage the cluster and co-ordinate the other nodes. There is only 1 in a cluster.
Core nodes run tasks and store data.
Tasks nodes are optional and just run tasks but don’t store data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What service would be used to make ML-powered interactive dashboards?

A

QuickSight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

When can QuickSight not use SPICE in-memory computation?

A

When it is attached to another database

17
Q

What granularity of security can you set in QuickSight?

A

Column-level security

18
Q

What does ETL stand for?

A

Extract, transform and load

19
Q

What is Glue?

A

A managed and server-less ETL service for analytics

20
Q

What are Glue Crawlers?

A

Scripts that crawl databases or data and write metadata to Glue Data Catalog, e.g. the type of data and its format

21
Q

What are Glue Job Bookmarks?

A

Bookmarks that show where a job was up to, preventing the re-processing of data

22
Q

What is Glue DataBrew?

A

A service that cleans and normalises data for analytics and ML without having to write code - many pre-written transformations

23
Q

What is Lake Formation and data lakes?

A

A data lake is a central place to keep all data of different types for analytics purposes.
Lake Formation is an AWS service that simplifies the process of creating a data lake through the automation of many complex processes.

24
Q

What level of granularity does Lake Formation have in terms of security?

A

Row/column level

25
Q

What service would be used for real-time analytics using SQL?

A

Kinesis Data Analytics for SQL

26
Q

Where can Kinesis Data Analytics for SQL read from?

A

Kinesis Data Streams and Kinesis Data Firehose

27
Q

What is a benefit of using Kinesis Data Analytics for Apache Flink over for SQL?

A

Flink is more powerful with more advanced querying that just using SQL

28
Q

What is Amazon MSK?

A

Managed Streaming for Apache Kafka ( a data streaming alternative to Kinesis) - fully managed Kafka on AWS

29
Q

Are Kinesis Data Streams’ streams encrypted?

A

Yes, in-flight using TLS

30
Q
A