AWS Services Flashcards
Amazon Athena
works with what service?
use case?
Serverless Query service for S3 using SQL
Point it at your S3 buckets, define a schema, and search using SQL
Pay per query.
Don’t need to ETL.
Used for analytics. Cannot write.
Amazon Cloudsearch
Search Solution for Websites and Apps
Think adding a search bar to your website so users can search whatever you have there.
Features: Free text, Boolean, and Faceted search Autocomplete suggestions Customizable relevance ranking and query-time rank expressions Field weighting Geospatial search Highlighting Support for 34 languages
Amazon Elastic Search
Deploy and run ElasticSearch as a fully managed service, rather than buying all the infrastructure and licensing yourself.
Includes storing data, running searches, and hosting dashboards.
Does allow multi AZ configs
Amazon EMR
EMR= Elastic MapReduce.
Big Data Platform and Analysis.
Basically, this is managed Hadoop.
Good for unstructured data!
Amazon Kinesis
Shard, Sequence Number, Partition Key, Firehose
Real time streaming data capture and analysis
Mnemonic: Kinesis = movement, like a stream: streaming data
Has Shards, Has Firehose. Cannot EBS as it does not use EC2 instances.
There is a stream (say, a stream from a physical security system). That stream will use partition keys as a way to index records (for example, a partition key for heat sensor, and for camera). Those partition keys are used to separate those records into Shards. Each record, has a sequence number created that is unique per shard (i.e. each shard as Sequence Number 1, 2, 3, 4, 5, etc).
Firehose sends streaming data to AWS storage services.
Amazon Redshift
use case?
How to query?
Used for analytics, not transactions. Managed EC2 Data Warehouse Service. Uses SQL for queries Good for structured data AQUA = ADVANCED Query accelerator Spectrum = run huge queries quickly
Amazon QuickSight
Serverless ML BI Dashboards
Is fully managed, serverless, and flexible.
Offers a pay-per-use model
Can ask BI questions in natural language.
Redshift caches repeat queries
Amazon Data Exchange
Subscribe to 3rd Party Data Sets like: Square (the company) location of transaction data Weather data Stock data COVID data IMDB movie data
Amazon Data Pipeline
Transfer data on the AWS cloud by defining, scheduling, and automating each of the tasks.
Is a managed ETL (Extract-Transform-Load) service
Is NOT serverless; it manages the creation of EC2 and EMR instances to do work, and you pay for those.
Amazon Glue
Data discovery, enrichment and transfer
Is an ETL (Extract-transform-load) tool like Data Pipeline.
Is Serverless
Supports S3, RDS, Redshift, SQL, and DynamoDB
AWS Lake Formation
Set up Data Lakes quickly
Organizes data in S3, and sizes chunks for efficiency
Does some deduplication and normalization automatically
AWS Step Functions
Serverless Function Orchestration
Low-Code, visual workflow service for orchestrating AWS services
Amazon AppFlow
Integrate 3rd party app data
fully managed integration service to securely transfer data between Software-as-a-Service (SaaS) applications like Salesforce, Zendesk, Slack, and ServiceNow, and AWS services like Amazon S3 and Amazon Redshift,
Amazon EventBridge
Serverless Event Bus
Takes in events from 3rd party sources, and sends them to AWS services like Lambda
Amazon MQ
Diff from SQS?
Message Broker Service for Apache/Rabbit MQ
Should be used for legacy MQ apps migrating to AWS. All new apps should use SQS.
Amazon SNS
Push or Pull-based?
Simple Notification Messaging System
Push-based: it pushes out messages. this means it can do stuff like invoke a Lambda function.
Amazon SQS
Is it pull-based or push-based.
max retention?
Simple Queue Service Inter Component Messaging
Enables Loose coupling of applications (fewer interdependencies)
It is pull-based. It is purely reactive: things get put into it and get pulled out of it, passively. This means that it can’t do stuff like invoke a Lambda function.
Max retention = 14 days
Max Message Size = 256KB
Has many queue types (e.g. FIFO, delay, dead-letter, temporary)
Can have duplicates unless FIFO
Amazon AppSync
What do?
How managed?
GraphQL API Service
GraphQL is an API Query language that lets applications have a single endpoint to serve API requests for lots of kinds of stuff. EX: One API endpoint to get data from your RDS database, Lambda functions, and S3 storage.
https://graphql.org/
Is a fully managed service.
AWS Cost Explorer
Visualize and manage AWS costs
Allows costs to be grouped by all kinds of attributes, including custom “tags”
AWS Budgets
Service to set and monitor both cost and usage budgets
AWS Cost and Usage Report
reporting to analyse AWS usage
Amazon Managed Blockchain
Hyperledger & Ethereum Service
Quantum Ledger DB (QLDB)
Fully managed financial ledger db
Amazon EC2
Instance types: Spot, Scheduled Reserved instances, Reserved instances, On-Demand Instances
Spread/Partition/Cluster partition groups?
Secure, resizable Compute Instances (400+ options)
Spot: Only available when there is extra capacity. User must be flexible when they can be used.
Scheduled Reserved Instances: pick times to have reserved capacity
Reserved Instances: Reserve 24/7
On-Demand Instances: Pay-as-you-go. Flexible.
Partition Groups:
Cluster - as close together as possible in the AZ for performance
Partition - Separate nodes across hardware within the AZ
Spread - Same as partition, but limited to one server per group
EC2 Autoscaling
Types?
What’s a target tracking policy?
Automated compute capacity scaling
There is a cool-down timer
can be triggerd by lots of stuff
types= Simple, Step, Scheduled
Target tracking policy autoscales to keep a
Amazon LightSail
Easy virtual private server instances
Elastic Beanstalk
Deploy & scale web apps (Java/Ruby/etc)
AWS Lambda
Max execution time
Serverless Compute Functions
max runtime= 900 seconds (15 min)
Works well with API Gateway
ECR
Elastic Container Registry
ECS
How do Roles and Definitions work
Elastic Container Service to deploy/manage clusters & tasks
A definition definition specifies what docker image to use, and a bunch of other parameters, including Task Role
A Task role is an IAM role
Can only apply one role per task definition
EK
Elastic Kubernetes Service
AWS Copilot
CLI to launch and manage containers
AWS Fargate
Serverless Compute Engine for ECS/EKS Containers
Runs containerized applications for you, without your needing to pay for or plan for infrastructure.
Only supports container images from ECR and Docker Hub