All Flashcards
IAM Policy Structure
Version: 2012-10-17
ID (optional): identifier for the policy
Statement: one or more individual statements
Statement consists of:
- Sid: identifier for the statement (optional)
- Effect: whether the statement allows or denies access
- Principal: account/user/role to which applied
- Action: list of resources to which actions are applied
- Condition: condition for when the policy is in effect (optional)
What is the issue for the following errors:
- Your application is not accessible and you get a “timed out” when trying to access it.
- If your application gives a “connection refused” error
- Security Group issue
2. Application error, or it’s not launched (security group worked)
What are these Ports? 21 22 80 443 3389
21 - FTP (File Transfer Protocol) - upload files into a file share
22 - SSH (Secure Shell) - log into Linux instance, but also SFTP (Secure File Transfer Protocol) - upload files using SSH
80 - HTTP access to unsecured website
443 - HTTPS access to a secured website
3389 - RDP (Remote Desktop Protocol) - log into a Windows Instance
How to connect to Linux EC2 using PowerShell?
ssh -i PathTo.pemFile ec2-user@PublicIpAddressOfInstance
Pros and Cons of Cluster Placement Group - All instances on the same server rack in same AZ
Pros: Great Network - 10Gbps bandwidth between instances
Cons: If the rack fails, all instances fail at the same time
Use Case: Big data job that needs to complete fast or app that needs extremely low latency and high network throughput
Pros and Cons of Spread Placement Group - All instances are located on different racks/hardware, and across AZs
Pros: Can spread across AZs for reduced risk of simultaneous failure.
Cons: Limited to 7 instances per AZ per placement group
Use Case: App that needs to maximize high availability and critical apps where each instance must be isolated from failure from each other
Pros and Cons of Partition Placement Group - Each partition is a rack and each partition can have multiples in each AZ and spread across multiple AZ.
Pros: Up to 7 partitions per AZ and can spread across multiple AZs (within the same region) for up to 100s of instances. Don’t share the same hardware so a failure isn’t catastrophic
Use Cases: Big Data apps
What is an ENI?
Elastic Network Interface - acts as virtual network card and is a component of the VPC
They are bound to a specific AZ
What is EC2 Nitro?
New underlying platform for EC2 instances.
Will have higher speed EBS, better security, better networking
Why would you want to change the default vCPU options?
Sometimes licensing is charged based on number of cores. So the default of 2 threads per core and 8 cores (which would be 16 vCPU) could cost a lot. So if you want to keep the same amount of RAM, but don’t need all those vCPU, you can disable multithreading (allow just 1 thread per core) and lower the amount of overall cores to lower the cost of the licensing charges
EBS MultiAttach
Usually an EBS volume can only be attached to ONE ec2 instance at a time. However, with io1/io2 family EBS Volumes, you can attach these to multiple instances within the same AZ.
Which can be mounted in multi AZ?
EFS or EBS
EFS
Can Windows instances have an EFS mounted?
No. Only for Linux
EFS Performance Modes
General Purpose - latency sensitive use cases (web servers)
Max I/O - higher latency, throughput, highly parallel (big data, media processing)
EFS Throughput Modes
Bursting - 1TB = 50MiB/s + burst of up to 100MiB/s
Provisioned - set your throughput regardless of storage size (ex: 1GiB/s for 1TB storage)
What layer is TCP? HTTP? HTTPS? Network?
Network = Layer 3
TCP = Layer 4
HTTP and HTTPS = Layer 7
What protocol and on which port does the Gateway Load Balancer use?
GENEVE protocol on port 6081
Sticky Sessions
This mean that a client accessing an EC2 instance through a load balancer will be directed to the same EC2 instance every time. This is done via a “cookie”. The cookie will expire eventually, and the client will be directed to whichever instance the load balancer sees fit if it is a duration based cookie. You can also create your own without an expiry date
Which load balancer always has Cross Zone Load balancing?
Application Load Balancer. There are no charges for inter-AZ data transfer
It can be enabled for NLB and CLB. Only NLB will charge you for inter-AZ data transfer.
What is SNI (Server Name Indication)?
SNI solves the problem of loading multiple SSL certificates onto one web server (to serve multiple websites)
It’s a newer protocol and requires the client to indicate the hostname of the target server in the initial SSL handshake
Only works for ALB & NLB or CloudFront (not CLB)
In other words, you can have an ALB or NLB balance traffic between 2 different websites at once. When a user wants to access one of the websites, it will use SNI to tell the load balancer which site they want, so the load balancer can select the right SSL certificate, and encrypt the traffic to the correct site
Connection Draining/Deregistration Delay
If using a CLB, it is called Connection Draining. If using an ALB or NLB, it is called Deregistration Delay
This is a setting on an EC2 instance where once it becomes unhealthy, it doesn’t shutdown right away. The load balancer will stop routing new traffic to it, but for the traffic that has already been routed to it, the draining time will allow those people time to finish their task before the instance shuts down. Default is 300 seconds. Can go up to 3600 seconds.
Which is becoming legacy and which is new between “Launch Configuration” and “Launch Template”? (used for auto scaling groups)
Configuration is legacy, template is newer
You are using an Application Load Balancer to distribute traffic to your website hosted on EC2 instances. It turns out that your website only sees traffic coming from private IPv4 addresses which are in fact your Application Load Balancer’s IP addresses. What should you do to get the IP address of clients connected to your website?
When using an Application Load Balancer to distribute traffic to your EC2 instances, the IP address you’ll receive requests from will be the ALB’s private IP addresses. To get the client’s IP address, ALB adds an additional header called “X-Forwarded-For” contains the client’s IP address.
Application Load Balancers can route traffic to different Target Groups based on the following, EXCEPT:
Client Location
Hostname
Request URL Path
Source IP Address
Client Location. ALBs can route traffic to different Target Groups based on URL Path, Hostname, HTTP Headers, and Query Strings.
For compliance purposes, you would like to expose a fixed static IP address to your end-users so that they can write firewall rules that will be stable and approved by regulators. What type of Elastic Load Balancer would you choose?
Network Load Balancer has one static IP address per AZ and you can attach an Elastic IP address to it. Application Load Balancers and Classic Load Balancers have a static DNS name.
What are the RDS database engines?
Postgres MariaDB MySQL Oracle Microsoft SQL Server Aurora
Why is using RDS better than just launching your own database on an EC2 instance?
RDS is managed by AWS which means it has automated provisioning, OS patching, continuous backups, monitoring dashboard, read replicas, can be multi-AZ for disaster recovery, can set up maintenance windows, scaling ability, and backed by EBS.
Downside is that you can’t SSH into it since it is managed by AWS, not you.
Does RDS storage scale automatically?
Yes. You can set a maximum
Can RDS read replicas span across AZs?
Read Replicas can be within an AZ, cross AZ, or even cross region
Are RDS read replicas Asynchronous? (meaning that you can read them before they have a chance to match the main database exactly)
Yes, so reads are EVENTUALLY consistent.
What is RDS Multi AZ?
A feature used mainly as a disaster recovery in which there is a SYNCHRONOUS read replica created that is completely unused unless the main fails. This read replica will share the DNS Name with the master database so if that master database fails, the read replica will automatically take over as master
How much downtime is average for going from a single AZ RDS database to a mult-AZ database?
There is no downtime
If you create an RDS Database, and elect to not encrypt it, how can you later encrypt the read replicas made from this database?
You can’t.
At what step do you encrypt an RDS database?
Must be defined at launch time
What are 2 ways to encrypt your RDS Database?
AWS KMS - AES-256
Transparent Data Encryption (TDE) (only available for Oracle or SQL Server)
How do you encrypt an unencrypted RDS database?
Create a snapshot of the unencrypted database, copy that snapshot, and when you create that copy, you’ll have the option to enable encryption. Then restore the database from the encrypted snapshot. This creates a new, encrypted database to which you can migrate everything over to. Then delete the unencrypted database
Which database engines support access management to the RDS database via IAM authentication?
MySQL and PostgreSQL
RDS Shared Responsibility
Your responsibility:
- Check the ports/IP/security group inbound rules
- In-database user creation and permissions or manage through IAM
- Creating a database with or without public access
- Ensure parameter groups or DB is configured to only allow SSL connections
AWS responsibility: Since RDS is a managed service, you will have:
- No SSH access
- No manual DB patching
- No manual OS patching
- No way to audit the underlying instance
Aurora FAQs
- Not open sourced
- 5x performance over MySQL and over 3x for Postgres
- Storage automatically grows from 10GB up to 128TB
- Can have 15 read replicas and the process is much faster
- Failover is basically instant
- Costs ~20% more than RDS, but much more efficient
- Makes 6 copies of your data across 3 AZs
- One master, but any of the replicas can become master for failover
- Supports cross region replication
Aurora DB Cluster
There is a master that does read and write. But instead of you connecting to the master directly, you connect to a WRITER ENDPOINT that directs you to the master. (Good incase the master fails, you don’t have to find the new master). Same thing for reading the read replicas. You don’t connect to them directly, but instead, you connect to a READER ENDPOINT which also acts as a load balancer for all the replicas. Replicas can also auto scale.
Aurora Security
Similar to RDS.
Aurora Custom Endpoints
Aurora automatically has a Reader Endpoint to guide you to read replicas and load balance. However, if you have a variety of instance sizes making up the read replicas, you may want to use larger instance sizes for more work intensive queries. To do this, you can create custom endpoints that override the default reader endpoint.
Elasticache FAQs
- Caches are in-memory databases with really high performance, low latency
- RDS is to get managed relational databases as Elasticache is to get managed Redis or memcached
- Helps reduce load off of databases for read intensive workloads
- Helps make your app stateless
- AWS takes care of OS maintenance/patching, optimizations, setup, config, monitoring, failure recovery and backups
- Involves heavy app code changes to use
- Works kind of like CloudFront but for databases. App needs something from the database, but will check Elasticache first to see if it’s been cached.
Redis vs Memcached
Redis:
- Multi AZ with auto failover
- Read replicas to scale reads and have high availability
- data durability using AOF persistence
- backup and restore features
Memcached
- Multi-node partitioning of data (sharding)
- No high availability (replication)
- No persistence
- No backup and restore
- Multi-threaded architecture
ElastiCache Security
- Does not support IAM Authentication
- Redis AUTH lets you create a password (token) when you create a Redis user and supports SSL in flight encryption
- Memcached supports SASL-based authentication
Patterns for ElastiCache
Lazy Loading: all the read data is cached, data can become stale in cache
Write Through: Adds or updates data in the cache when written to DB (no stale data)
Session Store: store temp session data in a cache (using TTL features)
ElastiCache - Redis Use Case
- Gaming leaderboards are computationally complex
- Redis Sorted Sets guarantee both uniqueness and element ordering
- Each time a new element is added, it’s ranked in real time, then added in correct order
FQDN (Fully Qualified Domain Name): Protocol: Domain Name: Sub Domain: Second Level Domain: Third Level Domain: Root:
FQDN (Fully Qualified Domain Name): http://api.www.example.com. Protocol: http Domain Name:api.www.example.com. Sub Domain:www.example.com. Second Level Domain:example.com. Third Level Domain:.com. Root: .
How DNS Works
When you type in your URL (example.com), you are asking your local DNS Server for example.com. If your local DNS server (assigned by your company or ISP) doesn’t know it, the local DNS server will go ask the Root DNS Server (managed by ICANN). If the Root DNS server doesn’t know it, it will at least tell you where to look and give you the info for the TLD DNS Server since you are looking for a .com website. If the TLD DNS Server doesn’t know, it can give you a bit more info and lead you to the SLD DNS Server, which is the Domain Registrar (Route 53, GoDaddy, etc), and they will know the IP Address for example.com. Your local DNS Server will now cache that info
Is Route 53 authoritative?
yes
Main Route 53 DNS record types?
A
AAAA
CNAME
NS
A:
AAAA:
CNAME:
NS:
A: maps a host name to IPv4
AAAA: maps a host name to IPv6
CNAME: maps a host name to another host name
- the target is a domain name that must have an A or AAAA record
- Can’t create a CNAME record for the top node of a DNS namespace (Zone Apex) (For example, you can’t create for example.com, but you can for www.example.com
NS: Name Servers for the hosted zone
- Control how traffic is routed for a domain
Route 53 Hosted Zones
A container for records that define how to route traffic to a domain and its subdomains
- Public Hosted Zones: contains records that specify how to route traffic on the internet (public domain names)
- Private Hosted Zones: contain records that specify how you route traffic within one or more VPCs (private domain names)
You pay $0.50 per month per hosted zone
CNAME vs Alias
AWS Resources (load balancers, CloudFront, etc) expose an AWS host name such as lb1-1234.us-east-2.elb.amazonaws.com. But if you want to use myapp.mydomain.com you can use:
CNAME - Point the ugly hostname to the pretty hostname. But ONLY for non-root domains. So it’ll work for myapp.mydomain.com, but not mydomain.com
Alias - Specific to Route 53 and allows you to point the hostname to an AWS resource. This does work for root domains as well as non-root domains. The are also free
Alias record targets
ELB CloudFront API Gateway Elastic Beanstalk S3 websites (not the bucket, but the website) VPC Endpoints Global Accelerator Route 53 Record (same Hosted Zone)
CANNOT for EC2 DNS name
Route 53 Routing Policies: Simple Weighted Failover Latency Based Geolocation Multi-Value Answer Geoproximity (using Route 53 Traffic Flow feature)
Simple - Typically route traffic to a single source (ask for a.com, get back 11.2.55.213) but can be multiple sources, and one is picked at random (ask for x.com, get back 1.2.3.4 as well as 5.6.8.7) (no health checks)
Weighted - Route traffic to multiple sources, but assign values to say one source is more or less likely than others (dns records must have the same name and type and there can be health checks) Typically used for load balancing across regions or testing a new app by only sending a small percentage of traffic to the testing instance. If one instance is given 0 weight, no traffic will go there, but if ALL instances are given the weight of 0, they will all return traffic equally
Failover:
- Active/Passive: If an instance fails a health check, route 53 will direct traffic to the back up instance
Latency Based - Auto Directs traffic based on how quickly they can access the instance. (health checks available)
Geolocation: Based on where the user is located. Create a “default” record first in case there is no matching location. (health checks available) Does not auto direct, you set which locations go to which instance.
Multi-Value Answer - Up to 8 records will be available for a user to access. (health checks available)
Geoproximity (using Route 53 Traffic Flow feature) - route traffic based on both user location as well as resource location. Use bias values to give weight.
(example: there is a resource set up in N. Virginia and another set up in California, and there are 4 users accessing this resource. User locations are Virginia, California, Louisiana, New Mexico. If bias on both resources is set to 0, it will act like latency based routing. But if you give N. Virginia a bias of 50, and California a bias of 0, then that means N. Virginia has a wider reach, and will take the users in Virginia and Louisiana of course, but now will also take the New Mexico user)
Route 53 Health Checks
HTTP health checks are only for PUBLIC resources and will check if the resource is working before sending traffic.
- Monitor at Endoint: About 15 different health checks will report
- Calculate health checks: Combine health checks into one
- Private Hosted Zone: Since health checks can’t access instances in a private subnet, you would create a CloudWatch Alarm to go off when the instance becomes unhealthy, and the health checker can watch for that alarm instead of trying to watch the instance
Route 53 Traffic Flow
This is a UI to more easily create and manage DNS routing records
Elastic Beanstalk Cost?
Free itself, only pay for the services provisioned.
S3 Keys
Files have a key. If the bucket URL is s3://mybucket, then a jpg named bob in that bucket will have the URL of s3://mybucket/bob.jpg. If you place the bob.jpg in a folder call dude, then the URL for the jpg will be s3://mybucket/dude/bob.jpg. The key for the jpg is everything after the s3://mybucket. So the key for the bob.jpg in the dude folder would be dude/bob.jpg
To break down the key, dude/ would be the prefix and bob.jpg would be the object name
S3 Encryption Options
SSE-S3 - keys handled and managed by AWS
- AES-256
- We upload the object, S3 provides and applies the key
- Header will be: “x-amz-server-side-encryption”:”AES256”
SSE-KMS - AWS KMS to handle and manage keys
- Advantages are user control + audit trail
- We upload the object, S3 provides and applies the key
- Header will be: “x-amz-server-side-encryption”:”aws:kms”
SSE-C - You manage your own keys
- S3 does not store the key
- HTTPS must be used
- We upload the object and the key, but S3 will still apply the key to encrypt
Client Side Encryption
- You encrypt the object before uploading to S3 and decrypt it when retrieving it
S3 Security
User based: IAM Policies - Which API calls should be allowed for a specific user from IAM console
Resource based: Bucket Policies - bucket wide rules from S3 console, allows cross account
SDK
Must use when coding against AWS Services such as DynamoDB
S3 MFA-Delete
Must enable versioning
Only the root user can enable/disable MFA-Delete
Can only be enabled using the CLI, not the console
Once enabled, you’ll need to get an MFA code to permanently delete an object or suspend versioning
Amazon Glacier Retrieval Options
Expidited - 1 to 5 minutes
Standard - 3 to 5 hours
Bulk - 5 to 12 hours
(minimum storage duration is 90 days)
Amazon Glacier Deep Archive Retrieval options
Standard - 12 Hours
Bulk - 48 hours
Minimum storage duration is 180 days
S3 KMS Limitation
KMS does have a limit of requests per second (varies by region), so if you have this as your default encryption on your S3 bucket, then it’s possible to hit that limit
S3 multi-part upload
Recommended for files over 100MB
Required for files over 5GB
S3 Transfer Acceleration
Increase transfer speed by transferring file to an AWS edge location which will forward the data to the S3 bucket in the target region. This means that we only use the public internet to get it to the edge location, then use the private AWS network to get it from the edge location to the bucket
S3 byte range fetches
Can be used to speed up downloads. Downloads in parts instead of all in a single file
S3 select and Glacier Select
This allows you to use SQL to make requests from specific parts of a file. So if you only need a few rows and columns from a CSV file, you can request this, and S3 will filter out what you want, and deliver it
S3 - Requester Pays
In general, bucket owner pays for all costs. This will allow for the requester to be billed for the request. (bucket owner still pays for the storage, just not the retrieval)
Athena
Serverless query service to perform analytics against S3 objects
uses SQL to query
Unicast vs Anycast IP (Global Accelerator uses Anycast IP)
Unicast IP: one server holds one IP address
Anycast IP: All servers hold the same IP address and the client is routed to the nearest one. So a user will want to connect to your ALB. Instead of connecting via the public internet, they will connect to the closest edge location, then travel the rest of the way over the AWS private network
Snowball Edge
FOR STORAGE ONLY:
40 vCPUs
80TB
Use for data transfers of less than a petabyte
FOR EDGE COMPUTING:
52 vCPUs, 208GB RAM
Optional GPU
42TB storage
Snowcone
FOR STORAGE ONLY:
8TB
Use for data transfers up to 24TB
FOR EDGE COMPUTING: 2CPUs 4GB memory wired or wireless access USB-C power with optional battery
Snowmobile
100PB
AWS OpsHub
A software you install locally to give a GUI for using the snow devices. (without this, a CLI is needed)
Amazon FSx
Allows you to launch 3rd party high-performance files system on AWS such as:
FSx for Lustre
FSx for Windows File Server
FSx for Windows File Server
EFS is a shared POSIX system for Linux systems (so can’t use for windows instances). So this is when you’d use FSx for Windows File Server
Supports SMB protocol & Windows NTFS
Scalable
Can be multi AZ
backed up to S3
FSx for Lustre
A type of parallel distributed file system for large-scale computing
The name Lustre is a combo of Linux and cluster
Used for things like Machine Learning and High Performance Computing (HPC)
Scalable
Seamless integration with S3 (read and write)
FSx File System Deployment Options
Scratch File System: Temporary and data is not replicated. Very Fast 200MBps per TB). Used for short term processing
Persistent File System: Long term storage and data is replicated within same AZ. Replaces failed files within minutes. Use for long term processing
How do you expose S3 for on prem access?
Storage Gateway
Storage Gateway
bridge between on prem and cloud
3 TYPES
File Gateway:
- Configured S3 buckets are accessible using the NFS and SMB protocol
- Supports S3 standard, IA, One Zone IA
- Bucket access using IAM roles for each file gateway
- Most recently accessed data is cached in the file gateway
- Can be mounted on many servers
- Integrated with Active Directory (AD) for user authentication
Volume Gateway -
- Block Storage using iSCSI protocol backed by S3
- Backed by EBS snapshots which can help restore on prem volumes
- There are both Cached volumes for low latency access to most recent data, or Stored Volumes to have the entire dataset on prem with scheduled backups to S3
Tape Gateway - Some companies have backup processes using physical tapes
- Virtual Tape Library (VTL) backed by S3 and Glacier
Storage Gateway Hardware
Others require a gateway to be installed like software on your own servers. If you don’t have room for that, then you can buy a piece of hardware
FSx File Gateway
Native access to FSx for Windows File Server
Local cache for frequently accessed data
AWS Transfer Family
Fully managed service to transfer to and from S3 and EFS using the FTP protocol
Uses either FTP, FTPS or SFTP (FTP is the only one that isn’t encrypted)
Storage Comparison
S3: Object storage
Glacier: Object Archival
EFS: Network file system for LINUX
FSx for Windows: same as EFS but for windows
FSx for Lustre: HPC Linux file system
EBS: network storage for one EC2 instance at a time
Instance Store: Physical store for EC2. Faster than EBS, but goes away when EC2 shutsdown
Storage Gateway: transfers on-prem to and from cloud
Snow family: same as storage gateway, but with phyical devices instead of over network
Database: indexing and querying
SQS
Default Retention: 4 days, maximum of 14 days
Less than 256KB per message
SQS Dead Letter Queue
If a message is failed to be processed, it goes back to the queue, and if this keeps happening, you can have the message leave the main queue and enter a “Dead Letter Queue” so you can review it later and debug the issue
Kinesis
Data Streams: capture, process, and store data streams
Firehose: load data streams into AWS data stores
Data analytics: analyze data streams with SQL or Apache Flink
Video Streams: capture, process, and store video streams
Kinesis Data Streams
Streams are made up of shards
Billing is per shard provisioned
retention is between 1 and 365 days