Big Data Flashcards

Question 1

Q

Volume
Variety
Velocity

Answer

A

3 V of Big Data

Question 2

Q

Fully managed petabyte-scale data warehouse service in the cloud

Question 3

Q

Very large relational database traditionally used in big data applications

Question 4

Q

Based on the PostgreSQL database engine but not used for OLTP workloads

Question 5

Q

Column-based data storage instead of row-based

Question 6

Q

Does Redshift support Multi-AZ

Answer

A

Yes, only spans 2 AZ

Question 7

Q

Does Redshift use Snapshots?

Answer

A

Yes, contained in S3

Question 8

Q

Can you control the S3 bucket containing snapshots?

Question 9

Q

Query and retrieve data from S3 without having to load the data into Redshift tables

Answer

A

Redshift Spectrum

Question 10

Q

All copy and unload traffic between your cluster and your data repositories is forced through your VPC

Answer

A

Enhanced VPC Routing

Question 11

Q

Extract
Transform
Load

Question 12

Q

AWS service used to help with ETL processing

Answer

A

Elastic Map Reduce (EMR)

Question 13

Q

Scalable file system for hadoop that distributes stored data across instances.

Answer

A

Hadoop Distributed File System (HDFS)

Question 14

Q

Used for caching results during processing

Question 15

Q

Extends Hadoops to add the ability to directly access data stored in Amazon S3

Answer

A

EMR File System (EMRFS)

Question 16

Q

Locally connected disk created with each EC2 instance, volume will only remain during the lifecycle of ec2 instance.

Answer

A

Local File System

Question 17

Q

Groups of EC2 instances (nodes) within Amazon EMR

Question 18

Q

Manages the cluster, coordinates the distribution of data and tasks

Answer

A

Primary Node

Question 19

Q

Node that runs tasks and stores data in the Hadoop Distributed File System

Answer

A

Core Node

Question 20

Q

Optional node that only runs tasks with no storage of data within the HDFS

Answer

A

Task Node

Question 21

Q

Interactive Query service that makes it easy to analyze data in S3 using SQL.
Serverless SQL solution

Question 22

Q

Service that directly query data in s3 bucket without loading it into a database

Question 23

Q

Serverless data integration service that makes it easy to discover, prepare, and combine data.

Question 24

Q

Service allows you to perform ETL workloads without managing underlying servers.

Question 25

Q

Fully managed serverless business intelligence (BI) data visualization service

Answer

A

Amazon QuickSight

Question 26

Q

Useful for business data visualizations, ad-hoc data analytics, and obtaining important data-based business insights

Answer

A

QuickSight

Question 27

Q

In-memory engine used to perform advanced calculations within QuickSight

Question 28

Q

Managed ETL service for automating movement and transformation of your data

Answer

A

Data Pipeline

Question 29

Q

Data-driven workflows. Steps are dependent on previous tasks completing successfully

Answer

A

Data Driven

Question 30

Q

Define parameters for data transformations. AWS Data Pipeline enforces chosen logic

Answer

A

Parameters

Question 31

Q

Specify the business logic of your data management needs

Answer

A

Pipeline Definition

Question 32

Q

Service will create EC2 instances to perform your activities

Answer

A

Managed Compute

Question 33

Q

Poll for different tasks and perform them when found

Answer

A

Task Runners

Question 34

Q

Define the locations and types of data that will be input and output

Answer

A

Data Nodes

Question 35

Q

Pipeline components that define the work to perform

Answer

A

Activities

Question 36

Q

Processing data in EMR using Hadoop streaming
Importing or exporting DynamoDB data
Copying CSV files or data between S3 buckets
Exporting RDS data to S3
Copying data to redshift

Answer

A

Data Pipeline

Question 37

Q

Fully managed service for running data streaming applications that leverage Apache Kafka

Answer

A

Amazon MSK

Question 38

Q

Used as a managed analytics and visualization service. It is suitable for creating a logging solution involving visualization of log file analytics or BI reports.

Answer

A

Amazon OpenSearch

Question 39

Q

Managed service allowing you to run search and analytics engines for various use cases.

Answer

A

Amazon OpenSearch Service