What is ...? Flashcards
AWS Glue
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL. AWS Glue generates the code to execute your data transformations and data loading processes.
AWS Fargate
sorta like Elastic Beanstalk for containers, builds containers and deploys them
ECS and its equivalent
highly scalable, high performance container management service that supports Docker containers and allows you to easily run applications on a managed cluster of Amazon EC2 instances. Amazon ECS makes it easy to use containers as a building block for your applications by eliminating the need for you to install, operate, and scale your own cluster management infrastructure. Amazon ECS lets you schedule long-running applications, services, and batch processes using Docker containers. Amazon ECS maintains application availability and allows you to scale your containers up or down to meet your application’s capacity requirements.
EKS and its equivalent
Kubernetes
Athena
an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. serverless. analyse log data in S3.
FSx for Lustre
compute-intensive workloads. doesn’t support the Windows-based applications as well as FSx for Windows file servers. can store data on S3
FSx for Windows File Server
- centralized storage for windows-based applications; SMB, sharepoint, sql server, workspaces, iis webserver, etc
- needs vpn or direct connect
DataSync
provides a fast way to move large amounts of data online between on-premises storage and Amazon S3 or Amazon Elastic File System (Amazon EFS).
on prem will no longer be used.
SQS long polling
Long polling helps reduce your cost of using Amazon SQS by reducing the number of empty responses when there are no messages available to return in reply to a ReceiveMessage request sent to an Amazon SQS queue and eliminating false empty responses when messages are available in the queue but aren’t included in the response.
- Long polling reduces the number of empty responses by allowing Amazon SQS to wait until a message is available in the queue before sending a response. Unless the connection times out, the response to the ReceiveMessage request contains at least one of the available messages, up to the maximum number of messages specified in the ReceiveMessage action.
- Long polling eliminates false empty responses by querying all (rather than a limited number) of the servers. Long polling returns messages as soon any message becomes available.
TLDR; short polling returns a response immediately, long polling doesnt return a response until a message arrives in the message queue, or the long poll times out.
SQS short polling
The ReceiveMessageWaitTimeSeconds is the queue attribute that determines whether you are using Short or Long polling. By default, its value is zero which means it is using Short polling
ParallelCluster
an AWS-supported open-source cluster management tool that makes it easy for you to deploy and manage High Performance Computing (HPC) clusters on AWS. It does not provide higher bandwidth, higher packet per second (PPS) performance, and lower inter-instance latencies, unlike ENA or EFA.
Elastic Fabric Adapter (EFA)
simply an Elastic Network Adapter (ENA) with added capabilities. It provides all of the functionality of an ENA, with additional OS-bypass functionality. OS-bypass is an access model that allows HPC and machine learning applications to communicate directly with the network interface hardware to provide low-latency, reliable transport functionality.
The OS-bypass capabilities of EFAs are not supported on Windows instances. If you attach an EFA to a Windows instance, the instance functions as an Elastic Network Adapter, without the added EFA capabilities.
Elastic Network Adapter (ENA)
supports network speeds from 10Gbps up to 100Gbps for supported instance types. Elastic Network Adapters (ENAs) provide traditional IP networking features that are required to support VPC networking.
step scaling
Increase or decrease the current capacity of the group based on a set of scaling adjustments, known as step adjustments, that vary based on the size of the alarm breach.
cheapest S3 teir
S3 glacier deep archive
glacier deep archive retrieval time
12 hours
s3 encryption in transit
SSL/TLS
S3 Encryption at rest
- s3 managed keys - SSE-S3
- aws key management service, managed keys - SSE-KMS
- server side encryption with customer provided keys - SSE-C
glacier retrieval time
minutes to hours
least durable S3
One zone-IA
Service control policies (SCP)
enable/disable AWS services either on OU or individual accounts
ways to share s3 buckets
-bucket policies & aim (entire bucket)
-bucket acl’s & iam (individual objects)
-cross-account iam roles (console access as well)
all methods programmatic access only
cloudfront origin
origin of all files the CDN will distribute. can be s3, ec2, elb, route53
cloudfront distribution
name given to the cdn which conists of a collection of edge locations
cloudfront edge locations
read and write, objects are cached for life of the TTL(time to live)
volume gateway - stored volumes
entire dataset stored on site and asynchronously backed up to S3
volume gateway - cached volumes
entire dataset stored on S3 and the most frequently accessed data is cached on site.
Macie
analyze data in S3 to identify PII. can analyse cloudtrail logs for suspicious api activity. good for PCI-DSS compliance and ID theft
blocking specific IP’s
cannot block IP’s with security groups, only network ACL’s
move an ec2 volume
AZ move: take snapshot, create ami from snapshot, use ami to launch ec2 in new AZ
region move: take snapshot, create ami, copy ami from one region to another, use copied ami to launch new ec2 instance.
ebs encryption specifics
- snapshots of encrypted volumes auto encrypted
- restored volumes of encrypted snapshots are encrypted
ways to encrypt ebs volumes
create snapshot of volume and select encrypt option, use snapshot to create ami, launch instance using ami
cloudwatch default and lowest monitoring intervals
5 min default and 1 minute with detailed monitoring.
ec2 meta data
get public ip, etc. traffic is not logged
Elastic File Store (EFS)
- linux and linux-based
- supports network file system NFSv4, only pay for storage used.
- scales up to petabytes.
- thousands concurrent nfs connections.
- multi-az
- 1 vpc at a time
- simple, scalable file storage for use with your Amazon ECS tasks.
block malicious IP addresses or range of IP’s
network ACL’s, layer 4
cross-site scripting and SQL injections
use WAF, layer 7
RDS (OLTP)
sql, mysql,postgreSQL, oracle, aurora, mariaDB.. not serverless except for aurora
noSQL
dynamoDB
OLAP
redshift
elasticache
db in-memory caching; memcached and redis. redis for multi-az, backups, and restores.
RDS backups
automated, db snapshots
RDS read replicas
increase read performance. can be multi-az and multi-region. backups must be turned on.
RDS Multi-AZ
only used for availability and Disaster recovery
RDS encryption
Uses AWS KMS. all components of RDS instance are encrypted including backups, read replicas, snapshots
redshift availability
1 AZ, automatic 1 day retention backups, max retention 35 days, will always attempt to keep 3 copies of data, backup in s3. can replicate to another region for DR
Aurora availability
2 copies of data in each AZ, a minimum of 3 az’s. 6 copies total. snapshots can be shared. backups turned on by default.
Elastic Load Balancers (ELB)
have DNS name
Alias record
always choose over CNAME, alias for a record, not the actual record
CNAME
name for the record
simple routing policy
one record with multiple IP’s, no health checks
default vpc comes with
route table, nacl, security group. no subnets no internet gateways. 5 IP’s are reserved. 1 internet gateway per vpc
NAT gateways
- 5gbps to 45gbps
- not associated with secruity groups
- auto-assigned public ip
- needs routing
default network ACL
stateless. allows all outbound and inbound traffic by default, assigned automatically if a subnet is not chosen by configurer. can block IP’s, security groups cant
custom network ACL
stateless. denies all inbound and outbound traffic by default. can block IP’s, security groups cant
unmonitored traffic
- instances when they contact amazon dns
- traffic generated by windows instance
- traffic to and from metadata port
- dhcp
- traffic to reserevd ip addresses for default vpc router
global accelerator
imrpoves availabilty and performance of application for local and global users. traffic traverses AWS backbone network.
vpc endpoints
privately connect vpc to supported aws services and endpoint services powered by privatelink without the use of any other devices or connections. no public IP requred. traffic doesnt leave amazon network. horizontally scaled, highly available, redundant.
vpc gateway endpoints
S3, dynamodb
private link
- peering vpc’s to 10-1000 customer vpc’s. no vpc peering; no route tables, nat gateways, igw’s,etc
- needs network load balancer on service/owner vpc and ENI on customer vpc
transit gateway
transitive peering between thousands of vpc’s and on-premises data centers in hub-and-spoke model, regional, multi-account access with resource account manager(RAM). works with route tables, direct connect, and vpn. supports IP multicast.
cloudhub
link multiple real world locations with vpc and other real world locations
SQS message queue times
1 minute to 14 days. default retention is 4 days. 256kb size.
SQS visibility timeout
time that the message is hidden after it is picked picked up. message is deleted if job is processed, otherwise message becomes visible again.
SWF
- task-oriented API, workflow executions can last up to 1 year.
- Actors: workflow starters, deciders, activity workers.
kinesis data firehose
analyze data in real-time, cannot send to S3 directly
kinesis streams
shards, persistence
kinesis data analytics
analyze data in both kinesis and data firehose
Cognito
Identity broker, handles interaction between applications and web ID provider. user authenticates with web id provider and receives an authentication token and exchanged for temporary credentials to assume an iam role.
X-Ray
debug serverless applications
lambda
- global
- maximum processing time of 15 minutes
Storage Gateway
replicate data, on-prem will still be used
Management Events
provide visibility into management operations that are performed on resources in your AWS account. These are also known as control plane operations. Management events can also include non-API events that occur in your account.
Data Events
provide visibility into the resource operations performed on or within a resource. These are also known as data plane operations. It allows granular control of data event logging with advanced event selectors. You can currently log data events on different resource types such as Amazon S3 object-level API activity (e.g. GetObject, DeleteObject, and PutObject API operations), AWS Lambda function execution activity (the Invoke API), DynamoDB Item actions, and many more.
is iam global?
yes but not every resource its attached to is
traffic difference between ALB NLB
alb blocks traffic at load balancer, nlb lets it pass.
cloudfront firewall
WAF
geo match
feature in cloudfront to block traffic from specific geo location
kms
- manages customer master keys(CMKs)
- regional
- ideal for s3 objects, database passwords, api keys stored in systems manager parameter store.
- up to 4kb in size
- audit using cloudtrail.
- fips 140-2 level 2
moving encrypted objects between regions
decrypt, move, re-encrypt using key from new region
CloudHSM
- managed service
- validated control for regulatory requirements of keys
- fips 140-2 level 3
- pkcs#11, Java cryptography extensions(JCE), microsoft cryptoNG (CNG)
- lost keys are irretreivable
- no aws api’s
- operates in its own vpc
systems manager parameter store
- securely manages configuration and secrets, caching and distributing secrets
- component of AWS Systems Manager(SSM)
- serverless
- good for: passwords, db connection strings, license codes, api keys
- encrypted(KMS) or plaintext
- store in hierarchies
- track versions
- can set TTL
secrets manager
manages configuration and secrets, more expensive at scale than systems manager parameter store but has features like:
- automatically rotates secrets and apply them in RDS
- can generate random secrets
- shared across accounts
- charged per secret and per 10k api calls
AWS Shield
protects against DDoS
WAF or ALB comes with shield standard no cost:
-L3 and L4 attacks: syn/udp floods, reflection attacks
- shield advanced: 3k per month, enhanced protection for ec2, elb, cloudfront, global accelerator, route 53
- 24x7 business and enterprise support from DDoS response team (DRT)
- DDoS cost protection
WAF
monitors http(s) requests to cloudfront, alb, or api gateway using filtering rules. filter by: -ip -query string parameters -sql injection
request options:
- allow all
- block all
- count
properties:
- originating ip
- originating country
- request size
- values in headers
- strings in requests matching regex
- cross-site scripting
can you boot from ebs hdd? ssd?
no, yes
throughput hdd
500 iops
data warehousing
log processing
sequential
provisioned ssd
32,000 iops
large database workloads
random access
general ssd
10,000 iops
general workloads
random access
glacier automatically encrypts data?
yes
Elastic IP
- static IP that can be moved, allowing decoupling
- An Elastic IP address doesn’t incur charges as long as the following conditions are true:
- The Elastic IP address is associated with an Amazon EC2 instance.
- The instance associated with the Elastic IP address is running.
- The instance has only one Elastic IP address attached to it.
RTO
recovery time objective. time it takes for system to recover
RPO
recovery point objective. how much data is lost if system fails.
fault tolerance
0% interruption, failure is concealed. higher requirement than high availability, think overkill.. 4 servers necessary at all teams means 8, 4 in 2 AZ’s
high availability
application will still perform, but may be slower. 4 servers necessary, 2 in 2 AZ’s
where to store all static content?
S3
when not to use RDS
- massive read/writes
- sharding
- simple get/put requests and queries
- RDBMS customization
AWS Config
tracks resources and verifies that new resources comply with configuration rules
VPC flow logs
log network traffic
Inspector
checks ec2 instances for security vulnerabilities
Trusted advisor
checks accounts for security, liability, performance, cost, and service limits
Which of the following is a custom metric in CloudWatch which you have to manually set up?
memory utilization
enhanced monitoring?
RDS not EC2
aws data pipeline
?
appstream
?
Posix
?
symmetric
?
asymetric
?
codecommit
managing a source-control service that hosts private Git repositories. You can store anything from code to binaries and work seamlessly with your existing Git-based tools. CodeCommit integrates with CodePipeline and CodeDeploy to streamline your development and release process.
codedeploy
deployment service that automates application deployments to Amazon EC2 instances, on-premises instances, or serverless Lambda functions. It allows you to rapidly release new features, update Lambda function versions, avoid downtime during application deployment, and handle the complexity of updating your applications, without many of the risks associated with error-prone manual deployments.
opsworks
?
cloudmap
cloud resource discovery service. With Cloud Map, you can define custom names for your application resources, and it maintains the updated location of these dynamically changing resources. This increases your application availability because your web service always discovers the most up-to-date locations of its resources.
automatic ebs encryption?
encryption by default
Aurora read replicas
- Aurora, MySQL, and postgreSQL
- cross-region read replicas
- asynchronous(milliseconds)
- automated failover
- up to 15 replicas
bucket-owner-full-control
bucket policy allowing ownership of bucket
egress-only Internet gateway
horizontally scaled, redundant, and highly available VPC component that allows outbound communication over IPv6 from instances in your VPC to the Internet, and prevents the Internet from initiating an IPv6 connection with your instances.
Elastic Beanstalk
automatic infrastructure building centered around given code. supports docker too
cloudformation template
- Format Version
- Description
- Metadata
- Parameters
- Mappings
- Conditions
- Transform
- Resources (required)
- Outputs
cross-zone load balancing
distribute incoming requests evenly to all EC2 instances across multiple Availability Zones
Inter-Region VPC Peering
is a thing