Big Data & Serverless Flashcards
What does ETL stand for?
Extract, Transform, Load
Big data ETL tool that can use open-source software (such has Spark, HBase ect) natively on AWS
Amazon EMR
What Amazon services can you run an EMR cluster on?
EC2
EKS
Outpost
processes data and puts in S3 Bucket
Open source tools that can be run on EMR
Spark
Hbase
Hadoop
Presto
How to reduce overall cost of EMR EC2 Clusters?
Use Spot or RI instances
Real-time streaming service for AWS
Kinesis
Kinesis that provides real-time speed where you have to manage producers and consumers (you must scale shards)
Kinesis Data Streams
Kinesis that provides nearly real time speed where you don’t have to worry as much about scaling AWS manages scaling
Kinesis Firehouse
Endpoints available for Kinesis Firehouse
Elasticsearch
S3
Redshift
Analyze Kinesis data using standard SQL
Kinesis Data Analytics
Application Requires real-time message delivery which service should you use?
Kinesis
TRUE or FALSE Kinesis Data Analytics is Serverless?
TRUE
Interactive query service that makes it easy to analyze data in S3 using SQL
Athena
Serverless data integration service to preform ETL without having to manage servers?
AWS Glue
How to use Athena with Glue?
Set up S3 bucket with data
Set up a Glue crawler to analyze data in bucket
Data is put in Glue Catalog
Amazon Athena can run queries on restructured data in the Catalog
Amazon Quicksight to visualize data in dashboard
TRUE or FALSE, Athena is serverless
TRUE
Fully managed data visualization service for BI similar to Tableau
AWS Quicksight
Managed ETL service for automating movement and transform of your data. Create data-driven workflows and enforces logic you define.
AWS Data-Pipeline
How to configure notifications and failures in AWS Data-Pipeline?
Via Amazon SNS
AWS Storage and Compute Services that AWS Data-Pipeline can be integrated with
DynamoDB
RDS
Redshift
S3
Compute:
EC2
EMR
TRUE or FALSE, for AWS Data-Pipeline I cannot use RI instances
FALSE, you can use previously existing instances
What are Data-Pipeline Task Runners
EC2 that poll for different tasks when found
What are Data-Pipeline Data Nodes
Define the locations and types of data for inputs and outputs
Popular Use Cases for using Data-Pipeline
Processing EMR data with Hadoop Steaming
Importing and Exporting DynamoDB data
Copying CSV files or data between S3 buckets
Exporting RDS data to S3
Copying Data to Redshift
Exporting MySQL data to S3 to generate reports
Fully managed streaming service leveraging Apache Kaftka. Easily to use and has great fault tolerance for integrating previously existing apps.
Amazon MSK
Fully-managed streaming service for leveraging Apache Kaftka. Easy to use and great for previously existing applications.
Amazon MSK
Which operations can you manage with Amazon MSK
Data Plane operations
For producing and consuming data
TRUE or FALSE, Amazon MSK using KMS keys for SSE encryption at default
TRUE
With Amazon MSK where can you send broker logs
Cloudwatch
Kinesis Firehose
S3
Analytics and visualization service that is used to analyze and search logs
Opensearch
Elasticsearch
Min and Max Memory Size of Lambda functions
128 MB
10480 MB (10 GB)
TRUE or FALSE a Lambda function can run inside a VPC
TRUE, a lambda can run inside or outside a VPC
Service that allows you to easily find, deploy, and publish your own serverless applications using SAM templates
AWS Serverless Application Repository
2 Options to choose from in Serverless Application Repostitory
PUBLISH
DEPLOY
Default Visibility of templates that you publish in Serverless Application Repository
Private,
but can make public to either certain AWS accounts or all
TRUE or FALSE, you must have an AWS account to deploy a Serverless Application Repostiroy template
False, you do not need an AWS account
Container flow (steps from start to finish of creating a container)
Create a Dockerfile
Create Image from Docker file
Put Image in Container registry service
Launch Container based on Image
What container service should I use if I need to run my containers on-prem
Kubernetes
Which container service should I use if I need to easily integrate with other AWS services
ECS
If I want to run a container and have an issue with cost should I pick EC2 or Fargate?
If long running containers that are 24/7 choose EC2
2 Types of Patterns allowed on EventBridge events
Event Pattern
Schedule
What types of images or artifacts are allowed in Amazon ECR?
Docker Images
OCI Images
OCI Artifacts
Is there a place to get public images in AWS?
Yes, ECR Public
How can I prevent a ton of old ECR images getting stored in my repository that are really old and I no longer use
Use a Lifecycle Policyq
Security feature in ECR that helps identify software vulnerabilites
Image Scanning
Are ECR repoistories global or regional?
Regional
TRUE or FALSE, ECR images are only shareable within the region they are in but can be shared across multiple accounts
FALSE, images can be shared both cross-regionally and cross-account
TRUE or FALSE, can tags be overwritten in ECR
True and False. If tag mutability is turned on in ECR repostiroy it prevents tags to be overwritten
Services ECR integrates with
ECS, EKS, Amazon Linux Containers, on-prem for own containers
I want to use EKS but don’t want it to be managed by AWS what service can I use?
EKS-D (EKS-Distro) which can be ran anywhere and user is fully responsible
I want to run EKS on-prem but want it to be managed by customer but with Amazon EKS efficencies what do I use?
EKS Anywhere
Can I run ECS on-prem?
Yes, using ECS Anywhere
Requirements from Running ECS Anywhere
Must have SSM agent, ECS Agent, and Docker installed on local server
Must register instances as SSM managed instances
Create installation script in ECS console (must contain SSM activation keys and commands for required software)
Execute scripts on on-prem servers or VMs
Deploy containers using EXTERNAL launch type