Big Data & Serverless Flashcards

1
Q

What does ETL stand for?

A

Extract, Transform, Load

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Big data ETL tool that can use open-source software (such has Spark, HBase ect) natively on AWS

A

Amazon EMR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What Amazon services can you run an EMR cluster on?

A

EC2
EKS
Outpost

processes data and puts in S3 Bucket

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Open source tools that can be run on EMR

A

Spark
Hbase
Hadoop
Presto

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How to reduce overall cost of EMR EC2 Clusters?

A

Use Spot or RI instances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Real-time streaming service for AWS

A

Kinesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Kinesis that provides real-time speed where you have to manage producers and consumers (you must scale shards)

A

Kinesis Data Streams

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Kinesis that provides nearly real time speed where you don’t have to worry as much about scaling AWS manages scaling

A

Kinesis Firehouse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Endpoints available for Kinesis Firehouse

A

Elasticsearch
S3
Redshift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Analyze Kinesis data using standard SQL

A

Kinesis Data Analytics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Application Requires real-time message delivery which service should you use?

A

Kinesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

TRUE or FALSE Kinesis Data Analytics is Serverless?

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Interactive query service that makes it easy to analyze data in S3 using SQL

A

Athena

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Serverless data integration service to preform ETL without having to manage servers?

A

AWS Glue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How to use Athena with Glue?

A

Set up S3 bucket with data

Set up a Glue crawler to analyze data in bucket

Data is put in Glue Catalog

Amazon Athena can run queries on restructured data in the Catalog

Amazon Quicksight to visualize data in dashboard

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

TRUE or FALSE, Athena is serverless

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Fully managed data visualization service for BI similar to Tableau

A

AWS Quicksight

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Managed ETL service for automating movement and transform of your data. Create data-driven workflows and enforces logic you define.

A

AWS Data-Pipeline

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How to configure notifications and failures in AWS Data-Pipeline?

A

Via Amazon SNS

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

AWS Storage and Compute Services that AWS Data-Pipeline can be integrated with

A

DynamoDB
RDS
Redshift
S3

Compute:
EC2
EMR

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

TRUE or FALSE, for AWS Data-Pipeline I cannot use RI instances

A

FALSE, you can use previously existing instances

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What are Data-Pipeline Task Runners

A

EC2 that poll for different tasks when found

23
Q

What are Data-Pipeline Data Nodes

A

Define the locations and types of data for inputs and outputs

24
Q

Popular Use Cases for using Data-Pipeline

A

Processing EMR data with Hadoop Steaming

Importing and Exporting DynamoDB data

Copying CSV files or data between S3 buckets

Exporting RDS data to S3

Copying Data to Redshift

Exporting MySQL data to S3 to generate reports

25
Q

Fully managed streaming service leveraging Apache Kaftka. Easily to use and has great fault tolerance for integrating previously existing apps.

A

Amazon MSK

26
Q

Fully-managed streaming service for leveraging Apache Kaftka. Easy to use and great for previously existing applications.

A

Amazon MSK

27
Q

Which operations can you manage with Amazon MSK

A

Data Plane operations
For producing and consuming data

28
Q

TRUE or FALSE, Amazon MSK using KMS keys for SSE encryption at default

A

TRUE

29
Q

With Amazon MSK where can you send broker logs

A

Cloudwatch
Kinesis Firehose
S3

30
Q

Analytics and visualization service that is used to analyze and search logs

A

Opensearch
Elasticsearch

31
Q

Min and Max Memory Size of Lambda functions

A

128 MB
10480 MB (10 GB)

32
Q

TRUE or FALSE a Lambda function can run inside a VPC

A

TRUE, a lambda can run inside or outside a VPC

33
Q

Service that allows you to easily find, deploy, and publish your own serverless applications using SAM templates

A

AWS Serverless Application Repository

34
Q

2 Options to choose from in Serverless Application Repostitory

A

PUBLISH
DEPLOY

35
Q

Default Visibility of templates that you publish in Serverless Application Repository

A

Private,

but can make public to either certain AWS accounts or all

36
Q

TRUE or FALSE, you must have an AWS account to deploy a Serverless Application Repostiroy template

A

False, you do not need an AWS account

37
Q

Container flow (steps from start to finish of creating a container)

A

Create a Dockerfile

Create Image from Docker file

Put Image in Container registry service

Launch Container based on Image

38
Q

What container service should I use if I need to run my containers on-prem

A

Kubernetes

39
Q

Which container service should I use if I need to easily integrate with other AWS services

A

ECS

40
Q

If I want to run a container and have an issue with cost should I pick EC2 or Fargate?

A

If long running containers that are 24/7 choose EC2

41
Q

2 Types of Patterns allowed on EventBridge events

A

Event Pattern
Schedule

42
Q

What types of images or artifacts are allowed in Amazon ECR?

A

Docker Images
OCI Images
OCI Artifacts

43
Q

Is there a place to get public images in AWS?

A

Yes, ECR Public

44
Q

How can I prevent a ton of old ECR images getting stored in my repository that are really old and I no longer use

A

Use a Lifecycle Policyq

45
Q

Security feature in ECR that helps identify software vulnerabilites

A

Image Scanning

46
Q

Are ECR repoistories global or regional?

A

Regional

47
Q

TRUE or FALSE, ECR images are only shareable within the region they are in but can be shared across multiple accounts

A

FALSE, images can be shared both cross-regionally and cross-account

48
Q

TRUE or FALSE, can tags be overwritten in ECR

A

True and False. If tag mutability is turned on in ECR repostiroy it prevents tags to be overwritten

49
Q

Services ECR integrates with

A

ECS, EKS, Amazon Linux Containers, on-prem for own containers

50
Q

I want to use EKS but don’t want it to be managed by AWS what service can I use?

A

EKS-D (EKS-Distro) which can be ran anywhere and user is fully responsible

51
Q

I want to run EKS on-prem but want it to be managed by customer but with Amazon EKS efficencies what do I use?

A

EKS Anywhere

52
Q

Can I run ECS on-prem?

A

Yes, using ECS Anywhere

53
Q

Requirements from Running ECS Anywhere

A

Must have SSM agent, ECS Agent, and Docker installed on local server

Must register instances as SSM managed instances

Create installation script in ECS console (must contain SSM activation keys and commands for required software)

Execute scripts on on-prem servers or VMs

Deploy containers using EXTERNAL launch type