Big Data Flashcards

1
Q

Volume
Variety
Velocity

A

3 V of Big Data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Fully managed petabyte-scale data warehouse service in the cloud

A

Redshift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Very large relational database traditionally used in big data applications

A

Redshift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Based on the PostgreSQL database engine but not used for OLTP workloads

A

Redshift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Column-based data storage instead of row-based

A

Redshift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Does Redshift support Multi-AZ

A

Yes, only spans 2 AZ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Does Redshift use Snapshots?

A

Yes, contained in S3

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Can you control the S3 bucket containing snapshots?

A

No

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Query and retrieve data from S3 without having to load the data into Redshift tables

A

Redshift Spectrum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

All copy and unload traffic between your cluster and your data repositories is forced through your VPC

A

Enhanced VPC Routing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Extract
Transform
Load

A

ETL

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

AWS service used to help with ETL processing

A

Elastic Map Reduce (EMR)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Scalable file system for hadoop that distributes stored data across instances.

A

Hadoop Distributed File System (HDFS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Used for caching results during processing

A

HDFS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Extends Hadoops to add the ability to directly access data stored in Amazon S3

A

EMR File System (EMRFS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Locally connected disk created with each EC2 instance, volume will only remain during the lifecycle of ec2 instance.

A

Local File System

17
Q

Groups of EC2 instances (nodes) within Amazon EMR

18
Q

Manages the cluster, coordinates the distribution of data and tasks

A

Primary Node

19
Q

Node that runs tasks and stores data in the Hadoop Distributed File System

20
Q

Optional node that only runs tasks with no storage of data within the HDFS

21
Q

Interactive Query service that makes it easy to analyze data in S3 using SQL.
Serverless SQL solution

22
Q

Service that directly query data in s3 bucket without loading it into a database

23
Q

Serverless data integration service that makes it easy to discover, prepare, and combine data.

24
Q

Service allows you to perform ETL workloads without managing underlying servers.

25
Fully managed serverless business intelligence (BI) data visualization service
Amazon QuickSight
26
Useful for business data visualizations, ad-hoc data analytics, and obtaining important data-based business insights
QuickSight
27
In-memory engine used to perform advanced calculations within QuickSight
SPICE
28
Managed ETL service for automating movement and transformation of your data
Data Pipeline
29
Data-driven workflows. Steps are dependent on previous tasks completing successfully
Data Driven
30
Define parameters for data transformations. AWS Data Pipeline enforces chosen logic
Parameters
31
Specify the business logic of your data management needs
Pipeline Definition
32
Service will create EC2 instances to perform your activities
Managed Compute
33
Poll for different tasks and perform them when found
Task Runners
34
Define the locations and types of data that will be input and output
Data Nodes
35
Pipeline components that define the work to perform
Activities
36
Processing data in EMR using Hadoop streaming Importing or exporting DynamoDB data Copying CSV files or data between S3 buckets Exporting RDS data to S3 Copying data to redshift
Data Pipeline
37
Fully managed service for running data streaming applications that leverage Apache Kafka
Amazon MSK
38
Used as a managed analytics and visualization service. It is suitable for creating a logging solution involving visualization of log file analytics or BI reports.
Amazon OpenSearch
39
Managed service allowing you to run search and analytics engines for various use cases.
Amazon OpenSearch Service