Big Data Flashcards

1
Q

Volume
Variety
Velocity

A

3 V of Big Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Fully managed petabyte-scale data warehouse service in the cloud

A

Redshift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Very large relational database traditionally used in big data applications

A

Redshift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Based on the PostgreSQL database engine but not used for OLTP workloads

A

Redshift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Column-based data storage instead of row-based

A

Redshift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Does Redshift support Multi-AZ

A

Yes, only spans 2 AZ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Does Redshift use Snapshots?

A

Yes, contained in S3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Can you control the S3 bucket containing snapshots?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Query and retrieve data from S3 without having to load the data into Redshift tables

A

Redshift Spectrum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

All copy and unload traffic between your cluster and your data repositories is forced through your VPC

A

Enhanced VPC Routing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Extract
Transform
Load

A

ETL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

AWS service used to help with ETL processing

A

Elastic Map Reduce (EMR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Scalable file system for hadoop that distributes stored data across instances.

A

Hadoop Distributed File System (HDFS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Used for caching results during processing

A

HDFS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Extends Hadoops to add the ability to directly access data stored in Amazon S3

A

EMR File System (EMRFS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Locally connected disk created with each EC2 instance, volume will only remain during the lifecycle of ec2 instance.

A

Local File System

17
Q

Groups of EC2 instances (nodes) within Amazon EMR

A

Cluster

18
Q

Manages the cluster, coordinates the distribution of data and tasks

A

Primary Node

19
Q

Node that runs tasks and stores data in the Hadoop Distributed File System

A

Core Node

20
Q

Optional node that only runs tasks with no storage of data within the HDFS

A

Task Node

21
Q

Interactive Query service that makes it easy to analyze data in S3 using SQL.
Serverless SQL solution

A

Athena

22
Q

Service that directly query data in s3 bucket without loading it into a database

A

Athena

23
Q

Serverless data integration service that makes it easy to discover, prepare, and combine data.

A

Glue

24
Q

Service allows you to perform ETL workloads without managing underlying servers.

A

Glue

25
Q

Fully managed serverless business intelligence (BI) data visualization service

A

Amazon QuickSight

26
Q

Useful for business data visualizations, ad-hoc data analytics, and obtaining important data-based business insights

A

QuickSight

27
Q

In-memory engine used to perform advanced calculations within QuickSight

A

SPICE

28
Q

Managed ETL service for automating movement and transformation of your data

A

Data Pipeline

29
Q

Data-driven workflows. Steps are dependent on previous tasks completing successfully

A

Data Driven

30
Q

Define parameters for data transformations. AWS Data Pipeline enforces chosen logic

A

Parameters

31
Q

Specify the business logic of your data management needs

A

Pipeline Definition

32
Q

Service will create EC2 instances to perform your activities

A

Managed Compute

33
Q

Poll for different tasks and perform them when found

A

Task Runners

34
Q

Define the locations and types of data that will be input and output

A

Data Nodes

35
Q

Pipeline components that define the work to perform

A

Activities

36
Q

Processing data in EMR using Hadoop streaming
Importing or exporting DynamoDB data
Copying CSV files or data between S3 buckets
Exporting RDS data to S3
Copying data to redshift

A

Data Pipeline

37
Q

Fully managed service for running data streaming applications that leverage Apache Kafka

A

Amazon MSK

38
Q

Used as a managed analytics and visualization service. It is suitable for creating a logging solution involving visualization of log file analytics or BI reports.

A

Amazon OpenSearch

39
Q

Managed service allowing you to run search and analytics engines for various use cases.

A

Amazon OpenSearch Service