Architect Certification Flashcards
Recovery Time Objective (RTO)
Maximum amount of time in which a service can remain unavailable before it is classed as damaging to the business
Recovery Point Objective (RPO)
Maximum amount of time for which data could be lost for a service
Ways of getting data in/out of AWS from on-premise
Direct Connect, VPN Connection, Internet Connection
How much data does a Snowball appliance hold
50-80 TB
How much data does a Snowmobile hold
100 PB
Storage Gateway
Connects on-premise database to AWS S3
S3 classes
Standard Class (Durability = 11 9’s, Availability = 4 9’s), Infrequence Access (IA) (Durability = 11 9’s, Availability = 3 9’s), Amazon Glacier (Duribility = 11 9’s, Availabiliy = N/A). IA is often used for backup data. Gracias is used for “cold storage”. Standard is most expensive. Glacier is least expensive.
AWS Artifact
Allows access to AWS Compliance Reports which are useful to auditors. Reports include the scope (AWS services, regions, etc.)
S3 capacity
Files from 1 byte to 5 TB (later lesson says 0 byes to 5TB)
S3 Class: Standard
Automatically replicates data across AZs within a region. Can encrypt data in transit and at rest. Has data management capabilities so that data can be moved to other S3 classes or deleted for cost optimization.
S3 Class: Infrequent Access (IA)
Only difference from standard class is lower cost and lower availability.
S3 Class: Amazon Glacier
Stores data in archives instead of buckets. Archives can save up to 40TB. Archives are stored within vaults.
AMI
Baseline EC2. Can be purchased through Marketplace or selected from community versions.
Instance Type
The size of an instance based on several parameters. Key parameters are vCPUs, memory, instance storage and network performance. Instances are grouped into families.
Instance Families
Micro (low throughput services), General Purpose (small to medium databases, test servers and backend servers), Compute optimized (compute intensive, video processing, scientific apps), GPU (graphics intensive apps), FPGA (massively parallel such as genomics and financial computing), Memory Optimized (real-time in-memory apps), Storage Optimized (uses SSD to reduce latency for very high I/O like noSQL databases)
Instance Purchase Options
On-Demand, Reserved, Scheduled, Spot, On-Demand Capacity Reservations
On Demand Instances
Launch at any time, can be used for as long as you want, flat rate, typically used for short term uses, best fit for testing and development
Reserved Instances
Purchase is made for 1-3 year term in exchange for a discount. Instances are either paid for all upfront, partial upfront or no upfront.
Scheduled Instances
Used for daily, weekly or monthly tasks.
Spot Instances
Must big on available EC2 resources. As long as the bid price is above the fluctuating price set by Amazon, get to use the instance. If the big falls below the price, a 2 minute warning is issued before termination. Only useful for processing that can suddenly interrupted.
On Demand Capacity Reservations
Reserve capacity based on instance type, platform and tenancy within a particular AZ for any length of time. Can be used in conjunction with reserve instance discounts.
Shared Tenancy
EC2 will run on any available host regardless of who else is running on that same server.
Dedicates Instances
EC2 runs on dedicated hardware.
Dedicated Host
Similar to dedicated instances but allows the same host to be used by multiple instances. Also allowed for running licensed software.
User Data
Allows you run commands upon the first boot to install software or apply software patches.
Persistent Storage
Attaching EBS volumes. Network attached. Snapshots and backups can be created. Can be encrypted.
Ephemeral Storage
Storage on the EC2 instance. Data s lost as soon as EC2 is stopped for terminated, but will remain if rebooted.
Secuity Group
An instance level firewall to control ingress and egress.
Key Pair
Made up of a public key and private key. Function is to encrypt the login information for Linux and Windows EC2 instances, and then decrypt the same information allowing you to authenticate onto the instance. Allows you to logon to Linux via SSH. The public key is held by AWS and the private key is your responsibility and must not get lost. The key pair can be used for multiple instances. After initial login, you can setup less privileged access controls.
Security - OS patches
It is your responsibility to download and install OS patches.
System Status Checks
Checks AWS components out of our control. If there is an issue, the best thing to do is stop and start the instance which would cause the instance to start on another host resolving the problem. Don’t reboot because it will cause the instance to continue running on the same physical server.
Instance Status Checks
If this fails, your input will be required to resolve. It looks at the EC2 instance itself instead of looking at the host.
EC2 Container Service
Allows you to run docker-enabled applications packaged as containers across a cluster of EC2 instances. The burden of managing the cluster is the responsibility of AWS specifically AWS Fargate. There is no need to install management or monitoring software.
Way of launching an ECS cluster
Fargate launch or EC2 launch
Fargate Launch
Requires you to specify CPU and memory and define networking and IAM policies in addition to having to package application in containers.
ECS - EC2 Launch
You are responsible for patching and scaling your instances. You can specify instance type and how many instances in a cluster.
Monitoring containers
Done through Cloudwatch.
ECS Cluster
Collection of EC2 instances. Security Groups, Load Balancing and Auto Scaling can be used. Instance operates in much the same way as a single EC2 instance. Clusters act as a resource pool, aggregating resources such as memory and CPU. Dynamically scalable. Can only scale in a single region bus multiple AZs. Containers can be scheduled to deploy across cluster. Instances within the cluster also have docker daemon and an ECS agent.
ECR
Elastic Container Registry. Provides a secure location to store and manager docker images.
ECR components
Registry, Authorization Token, Repository, Repository Policy, Image
ECR Registry
Allows you to host and store docker images as well as create image repositories. Access can be controlled by IAM policies as well as repository policies. Before the docker client can access, it needs to be authenticated as an AWS user via an Authorization token.
ECR authorization token
Run the CLI get-login command which will generate an output response which will be your docker login command. This will produce an authorization token that can be used within the registry for 12 hours.
ECR Repositories
Allow you to group together and secure different docker images. Can create multiple repositories so docker images can be organized into different categories. Using policies from IAM and repository policies you can assign set permissions to each repository.
ECR Repository Policy
Resource based policies. Need to ensure you add a principle to the policy to determine who has access and what permissions they have. AWS user will need access to the ecr:GetAuthorizationToken API call.
ECR images
Can be pushed and pulled using docker commands once the security has been configured.
ECS for Kubernetes (EKS)
Allows you to run Kubernetes in AWS without having provision or manager the control plane. You just need to provide and maintain the worker nodes.
Kubernetes control plane
Schedules containers onto the nodes, tracks the state of all Kubernetes objects. AWS responsible for provisioning, scaling and manages this across different AZs.
Worker Node
Worker machine in Kubernetes. Runs as an on-demand EC2 instance and contains software to run containers. A specific API is used. Once provisioned, they can connect to EKS using an endpoint.
Steps to run EKS
(1) IAM service role that must be created to allow EKS to provision and config specific resources Needs to have the following permission policies attached: AmazonEKSServicePolicy, AmazonEKSClusterPolicy (2) Cloudformation stack must be created and run for you to run with EKS. (3) Install kubectl and AWS-IAM-Authenticator (4) Use the EKS console to create the EKS cluster. (5) Configure kubectl: Use the update-kubeconfig command via the AWS CLI to create a kubeconfig file. (6) Provision and configure worker nodes. (7) Configure worker node to join EKS cluster. — Your cluster and worker nodes are now ready for you to deploy your applications.
Elastic Beanstalk
Takes you web application code and automatically provisions and deploys the required resources to make it operational.
Application Version
Very specific reference to a section of deployable code. and will typically point to S3.
Environment
Refers to an application environment that has been deployed onto AWS resources which are configured and provisioned by Elastic Beanstalk. The environment in comprised of all the resources created by Elastic Beanstalk and not just the EC2 instance.
Environment Configurations
Collection of parameters and settings that dictate how the environment will have its resources provisioned.
Environment Tier
If it handles HTTP requests, it will run in a web server environment. If it does not process HTTP requests but processes messages from SQS, it will run in a worker environment.
Configuration Template
Baseline for creating a new unique environment configuration.
Platform
Culmination of components in which you can build your components using Elastic Beanstalk. OS of the instance, the programming language, server type (web or application) and components of Elastic Beanstalk itself.
Applications
An application is a collection of different elements such as environments, environment configurations and application versions.
Web Server Environment
Uses Route 53, Elastic Load Balancer, Auto Scaling, EC2, Security Groups.
Worker Environment
Uses SQS Queue, IAM Service Role, Auto Scaling, EC2
Elastic Beanstalk Workflow
(1) Create Application (2) Upload application and configuration to Elastic Beanstalk which creates the environment configuration (3) Environment is launched by Elastic Beanstalk (4) The environment can then be managed. If the management of the environment changes the environment configuration, the environment will automatically be updated should additional resources be required.
AWS Lambda charges
You only have to pay for every 100ms of use when the code is running.
Working with AWS Lambda
(1) Upload code to Lambda or write it in the editor provided. (2) Config the code to execute upon trigger from an event source (such as an object being uploaded to an S3 bucket) (3) Lambda will run your code (4) Lambda computes the run time in milliseconds as well as the quantity of lambda functions run to compute cost.
Components of AWS Lambda
Lambda function (compiled of your own code), Event Source (AWS sources that can be used to trigger your lambda functions), Trigger (an operation from an event source that causes the function to be invoked), Downstream Sources (resources required by the lambda function), Log Streams (to identify and troubleshoot issues. They come from the same function and are recorded in Cloudwatch)
Creating Lambda Functions
Select a Blueprint (preconfigured lambda functions to be used as a template), Configure Trigger, Configure Function (upload code or edit in-line, define required resources, max execution timeout, IAM role, handler name)
AWS Lambda benefit
Highly scalable and saves cost
Jobs
Unit of work run by AWS Batch. Can be an executable file, an application or shell script. Run on EC2 instances as containerized app. Has states such as “submitted”, “pending”, “running”, “failed”, etc.
Job Definitions
Specific parameters for the jobs and define how the job will run with what configuration. (ex: how many vCPUs, which data volume, IAM role, mount points)
Job Queues
Jobs that are scheduled are placed into a queue until they run. There can be different queues with different priorities. On-demand and spot instances are supported. AWS Batch can bid on your behalf for spot instances.
Job Scheduling
Takes care of when a job should run and from which compute environment. Typically on FIFO basis. Ensures higher priority queues are run first.
Managed Environments
The service will handle provisioning, scaling and termination of compute instances. Environment is created as an ECS cluster.
Unmanaged Environments
Environments are provisioned, managed and maintained by you. Allows for greater customization but requires greater administration and maintenance. Required you to create necessary ECS cluster.
Amazon Lightsail
VPS (Virtual Private Server). Designed to be quick, simple and easy to use at a low cost point for small businesses and individuals. Usually for simple websites, small applications and blogs. Multiple lightsail instances can run together and communicate. Can connect to other AWS resources and to your existing VPC via a peering connection.
ELB
Evenly distributes requests across EC2 instances, lambda functions, a range of IP addresses or even containers. Targets can be across multiple AZs. ELBs consist of multiple instances so that they are not a single point of failure.
Application Load Balancer
For applications running HTTP or HTTPS. Operates at the request level. Advances routing, TLS termination and visibility features targeted at application architectures allowing you route traffic to difference ports on the same EC2 instance
Network Load Balancer
Ultra high performance while maintaining low latency. Operates at the connection level. Handles millions of requests per second.
Classic Load Balancer
Meant for applications build in the EC2 Classic environment. Operates at both the request and connection level.
ELB Components
Listeners (defines how requests are routed based on ports and protocols set as conditions), Target Groups (resources where request are routed. can route to multiple target groups based on rules), Rules (defines which request is routed to which target group). ELB contains 1 or more listeners. Listeners contain 1 or more rules. Rules contain 1 or more conditions. All conditions in a rule equal a single action. Health Checks (if an instance does not respond to a health check, it stops sending traffic to it). Internet Facing ELB (nodes of the ELB are accessible via the internet so have public DNS name). Internal ELB (can only serve requests from within your VPC.) ELB Nodes (must be defined in every AZ you wish to route traffic). Cross Zone Load Balancing (Ensures that all targets across all AZs have an even distribution)
Using HTTPS as a ALB listener
For an ALB to encrypt traffic it will need an server certification and an associated security policy. SSL is a cryptographic protocol much like TLS. SSL and TLS are used interchangeably when discussing certifications on your ALB.
ALB server certificate
Server certificate used by ALB is a X.509 certificate which is digital ID provided by a Certification Authority such as AWS Certificate Manager (ACM). Used to terminate the encrypted connection received from the remote client and then the request is decrypted and forwarded to the resources in the target group. Can be created and provisioned by either the ACM or the IAM. ACM is preferred. IAM is used in regions not supported by ACM.
Load Balancer OSI Layers
ALB operates at the application layer while the NLB operates at the transport layer. NLB is good choice for high traffic applications or when a static IP is required.
Components of EC2 Auto Scaling
- Create a Launch Configuration or Launch Template, 2. Create an Auto Scaling Group
Block Storage
Data stored in chunks known as blocks. Blocks are stored on a volume and attached to single instance. Very low latency. Comparable to DAS (direct access storage).
File Storage
Data is stored as files with series of directories. Data is stored within file system. Shared access for multiple users. Comparable to NAS (network access storage).
Object Storage
Objects stored across flat address space. Object references by unique key. Each object can have metadata to help catagorize and identify.
S3 Region
Region must be specified when uploading data to S3 but the data will be replicated across AZs.
S3 Bucket
Bucket names must be globally unique. Default limitation of 100 buckets per account but can be increased if requested. Objects have unique object key.. Folders can be useful for categorizing, but S3 operates on the bucket level. It is not a file system.
S3 Storage Classes
Standard, Standard IA (infrequent access), Intelligent Tiering, One Zone IA (infrequent access), Reduced Redundancy Storage (RSS)
S3 - Frequent Access
Standard or Reduced Redundancy Storage (RSS). Standard is default and RSS is old and not recommended.
S3 - Infrequence Access
Standard IA or One Zone IA. Same access speed as Standard. Additional cost to retrieve data. One Zone IA does not replicate data across AZs so should only be used for data that can be reproduced. One Zone IA is more cost effective than Standard IA.
S3 - Intelligent Tiering
Objects are moved back and forth between frequent access and infrequence access depending on access patterns. Great for unpredictable access patterns. Data moved to infrequent access tier if not accessed for 30 days or more. Will be moved to frequent access tier when accessed and the 30 day timer will be reset. There are no retrieval costs like with standard IA and one-zone IA, but there is a cost per object. Each object must be larger than 128kb.
S3 - Bucket policy
Impose set of controls within a specific bucket. JSON. Only controls access to the data in the bucket. Permissions can be very specific (e.g. by user, by time, by IP address), Provides added granularity to bucket access.
S3 - Access control lists
Controls access only for users outside of your AWS account. ACLs are not as granular as bucket policies. Permissions are broad such as “list objects” and “write objects”.
S3 - Data Encryption
Server-side (SSE) and client-side (CSE) encryption methods: SSE-S3 (S3 managed keys), SSE-KMS (KMS managed keys), SSE-C (customer managed keys), CSE-KMS (KMS managed keys), CSE-C (customer managed keys). SSL is used for data in transit.
S3 - Versioning
Allows for multiple version of an object to exist. Useful for recovering from accidental deletions for malicious activity. Only the latest version is shown by default, but it is possible to view all versions. Versioning is not enabled by default. Once enabled, it cannot be disabled - only suspended. Adds a cost because storing multiple versions of objects.
S3 Lifecycle Rules
Ability to move data between storage classes based on specific criteria including Glacier or even deleting the data. The time frame is configurable.
S3 - static content and websites
Any object can be made public and accessible via a URL, CloudFront works closely with S3, Entire static website can be hosted on S3 to make it scalable.
S3 - large data sets
Good for storing large amounts of data. Scalable. Can is accessing simultaneously by different users.
S3 - integrations with other AWS services
EBS uses S3 to backup itself (the backups are not visible to users), Cloudtrail uses S3 to store logs (you can view these S3 obects), CloudFront (S3 can be used as origin for CloudFront)
S3 pricing
Varies by region. RSS is more expensive than Standard. Infrequent Access is more cost effective. The cost per gigabyte reduces when certain thresholds are reached. Additional charges for per 10000 PUT, COPY, POST, LIST requests). Charge for every 10000 GET requests (less expensive). Data transfer into S3 is free but transfer out costs per GB.
S3 anti-patterns
Archiving data for long term use. Data that is dynamic and changes very fast. Data that requires file system. Structured data that needs to be queried.
Glacier vault
container for Glacier archives. Region specific.
Glacier archive
can be any object. a vault can have unlimited archives.
Glacier dashboard
only allows you to create vaults. Operational processes must be done using code: Glacier web service API or AWS SDKs.
Moving data to Glacier
(1) Create vault (can use dashboard), (2) Move data into Glacier using API/SDK or by using Lifecycle rules.
Retrieving data from Glacier
Must use code. Must first create an archival retrieval job. Retrieval options are (1) Expedited: for urgent requests. Must be less than 250MB. Data available in 1-5 minutes. $0.03 per GB, $0.01 per request. (2) Standard: regardless of size. 3-5 hours to retrieve. $0.01 per GB, $0.05 per 1000 requests. (3) Bulk: used to retrieve petabytes of data. 5-12 hours. $0.0025 per GB, $0.025 per 1000 requests.
Glacier Security
Data encrypted by default using AES-256. Also uses vault access policies and vault lock policies.
Vault access policy
Resource based. Applied to specific vault. A vault can only have 1 vault access policy. JSON format. Policy contains principle component (determines “who” has acess). If the user also have identity policy, the vault access and identity policy are both looked at. If either has an explicit deny, access is denied.
Vault lock policy
Once set, cannot be changed. Used to prevent delete of files for compliance reasons.
Glacier Pricing
Single storage cost regardless of how much storage is being used. Varies by region. Transfer in is free. Transfer to another region is $0.02 per GB. There are also charges for retrieval requests.
Benefits of EC2 Instance Store
Included in cost of instance. Very high I/O speeds. Ideal for cache or buffer for rapidly changing data. Often used within a load balancing group where data is replicated for pooled between the fleet.
Instance store volumes
Not available for all instances. Capacity increases with the size of the EC2. Have the same security mechanism as the EC2.
Instance storage anti-pattern
Not to be used for data that needs to remain persisted or needs to be accessed or shared by multiple entities.
Elastic Block Storage
An EBS can only be attached to one EC2 but an EC2 can be attaches to multiple EBSs.
EBS Snapshot
Snapshot can be taken manually or by code. Snapshot is stored in S3. Snapshots are incremental meaning only the data that has changed is covered. New volumes can be created from a snapshot (in case the original EBS is lost). It is possible to copy an snapshot from one region to another.
EBS High Availability
Every write to replicated multiple times with an AZ. If the AZ fails, the EBS data will be lost. You can restore from a snapshot.
EBS SDD
suitable for smaller blocks, databases using transactional workloads, boot volumes for EC2. options are General Purpose SDD (GP2), Provisioned IOPS (IO1)
EBS HDD
designed for workloads requiring high rate of throughput, big data processing and logging information, large blocks of data. Cold HDD (SC1), Thoughput Optimized HDD (ST1)
General Purpose SSD (GP2)
single digit millisecond latency, can burst up to 3000 IOPS, baseline performance of 3 to 10000 IOPS, throughput up to 128 MB/s on volumes up to 170GB, throughput increases to 768 KB/second per GB up to a maximum of 160MB/second (+124 GB volumes)
Provisioned IOPS (IO1)
predictable performance for I/O intensive workloads, specify IOPS rates during creation of new EBS volume, volumes attached to EBS-optimized instances will deliver the IOPS defined within 10%, 99.9% of the time, volumes range from 4 to 16 TB, max IOPS possible is 20,000
Cold HDD (SC1)
lowest cost, designed for large workloads accessed infrequently, high throughput capability, can burst to 80 MB/s per TB, delivers 99% of the expected throughput, can’t be used as a boot volume for instances
Throughput Optimized HDD (ST1)
designed for frequently accessed data, suited for work with large data sets requiring throughput intensive workloads, ability to burst to 250MB/s, maximum burst of 500MB/s per volume, delivers 99% of expected throughput, not possible to use as boot volume
EBS Encryption
EBS offers encryption at rest and in transit. encryption is managed by EBS itself. You just need to specify if you want encryption. Uses AWS-256 by interacting with AWS-KMS (key management service). KMS uses customer master keys (CMK) to create data encryption keys (DEK). Snapshots are also encrypted and well as any volume created from a snapshot. Only available on selected instance types.
Creating new EBS volume
Can be created when creating the EC2 instance or can be created as a stand-alone EBS. For stand-alone, you’ll be asked for the AZ and it can only be attached to instances in that AZ.
Changing size of EBS volume
Can be done in AWS console or AWS CLI. After the increase, you’ll need to extend the filesystem. Also possible to extend by creating new volume from a snapshot.
EBS pricing
charged for storage provisioned (not based on usage), cost varies by volume type and region. Charged on a per second basis. Snapshots are stored on S3 and you will be charged for that storage
EBS anti-patterns
not good for temporary storage or multi-instance storage. Not suited for high durability or availability (S3 or EFS is a better option for this)
EFS
fully managed, highly available and durable, ability to create shared file systems, highly scalable, concurrent access by 1000’s of instances, limitless capacity, regional
Creating an EFS
must select the VPC. AWS will then create mount targets across the AZs. Allows you to connect to mount target IP address. Only compatible with NFS V4.0 and V4.1. Does not support windows OS. Linux instance must have NFS client installed to mount the target. Select performance mode (general purpose or max I/O),, choose general purpose if under 7000 operations is sufficient. (Use the metric PercentIOLimit to see percentage of the 7000 limit used. ) Configure encryption. Data is only encrypted at rest, not in transit. Can connect from on-premise as long as using direct connect or 3rd party VPN.
EFS - general purpose performance
used for most use cases, lowest latency, max of 7000 file system operations per second
EFS - max I/O performance
used for huge scale architectures. concurrent access by 1000’s of instances, can exceed 7000 operations per second, vitually unlimited throughput and IOPS, additional latency per I/O
Moving data into EFS
Can be done security from on-premise or AWS using file-sync agent. On premise can use VMWare ESXi host. AWS can use community AMI to be used with EC2 instance. Migration progress can be monitored with CloudWatch.
EFS pricing
No charge for data transfer. No charge for requests. Charged for data consumption per GB-month.
EFS anti-patterns
Not for data archiving. Not for relational database. Not recommended for temporary storage.
CloudFront
Content Delivery Network (CDN), distributes web traffic closer to end users via edge locations, data is cached (not durable), origin data is S3
Edge Locations
Located in areas of high population. cache data to reduce latency
Web Distribution
distributes static and dynamic content, uses both HTTP and HTTPS, allows you to add, remove and update objects, provides live stream functionality, origin can be web server, EC2 or S3 bucket
RTMP Distribution
For distributing streaming media using Adobe Flash media server’s RTMP protocol. Allows end user to start viewing media before file has been downloaded from the edge location. Source data must be S3.
Distribution Configuration
specify origin location, specify caching behavior options, define edge locations. Options are US/Can/Europe, US/Can/Europe/Asia or All edge locations. Select if should be associated with Web Application Firewall (WAF) for extra security. Can specify encryption via SSL certificate.
CloudFront pricing
Primarily based on data transfer and HTTP requests. Costs also for field-level encryption, invalidation requests, dedicated IP custom SSL
File Gateway
Ability to mount of map drives to S3 bucket as if it was a share held locally. Local cache is used for recently accessed data. Files are stored 1-1 in S3 as objects.
Storage Volume Gateways
Backup you local storage volume to S3. Local data remains on-premise. Mounted as iSCSI devices that applications can communicate with. Data us written to S3 as EBS snapshots. Volumes can be beteen 1GB and 16TB. up to 32 volumes per gateway. Max storage of 512TB per gateway. Storage buffer using on-premise storage is used to stage data. Uploaded using SSL and stored in encrypted format in S3. Easy to create snapshots at any time. Snapshots are incremental to reduce storage costs. If there is a on-premise disaster, EBS volumes could be created from snapshots and applications could be up and running in a VPC.
Cached Volume Gateways
Primary data storage is S3. Local data storage is used for buffering and a local cache for recently accessed data. Presented as iSCSI volumes. Local disks must be selected to be used for buffer/cache. Local disk used as staging point for data to be uploaded to S3. Each volume up to 32TB. Up to 32 volumes. Total storage of 1024TB per cached volume gateway. Possible to create snapshots of volumes as EBS snapshots on S3 which can be used to create EBS volumes in a disaster.
Gateway-Virtual Tape Library
Allows you to backup data to S3 but also use Glacier for data archiving.
VTL Components
Storage gateway: Configured as a tape-gateway acting as a VTL with a capacity of 1500 virtual tapes. Virtual Tapes: equivalent to physical tape cartridge with capacity of 100GB to 2.5TB. Data stored on VTs are backed by S3 and visible in Virtual Tape Library. Virtual Tape Library (VTL): equivalent to tape library containing virtual tapes. Tape Drives: Each VTL comes with 10 tape drives presented as iSCSI devices to your backup application. Media Changer: virtual device presented as iSCSI device to backup applications that manage tapes between your Tape Drive and VTL. Archive: equivalent to off-site storage facility giving you ability to archive tapes from VTL to Glacier.
Gateway pricing
Based on storage, requests and data transfer. Cost affected by region. Transfer in is free.
Snowball
Physical device for transferring on-premise data (petabytes) to S3 or vice versa. Comes as 50TB or 80TB device. Dust, water and tamper resistant. Can withstand a 8.5 G jolt in shipping container. High speed data transfer using RJ45 (Cat6), SFP+ Copper, SFP+ (Optical)
Snowball encryption and tracking
data is automatically encrypted by default using AES-256 using encryption keys from KMS. using end to end tracking using E Ink shipping label which ensures it is sent to the correct facility. Can be tracking using SNS messages or via AWS management console. Is also HIPAA compliant allowing shipping of health data into and out of S3. AWS will remove data from appliance according to NIST standards.
Snowball Data Aggregation
Data can be aggregated across multiple snowballs. As a general guideline, if it will take longer than a week to move data using existing connections, snowball should be considered.
Snowball process
Create an export job, receive delivery of snowball appliance, connect appliance to local network (connect while off, turn on device, configure), ready to transfer data, access required credentials, install snowball client, transfer data using the client, disconnect appliance when transfer is complete, return to AWS using specified shipping carrier. (Snowball appliance is property of AWS)
Snowball pricing
No charge to transfer data in, but you are charged S3 charges. There is a charge for each data transfer job plus shipping costs. 50 TB is $200, 80TB is $250 (Singapore = $320). Allowed 10 days. Delays incur additional charges. Data transfer charges out of S3 vary by region.
Relational vs Non Relational connections
clients to relational db maintain a connection and use SQL. client to non relational use REST over HTTP(S) and client must be authenticated and authorized.
Relational vs Non Relational
Relational: RDBMS/ACID engine, supports complex relationships between tables, uses structured query language, generally accessed using a persistent network connection, uses a schema to define tables. provides a processing engine within the database. Non Relational: simple document or key store, can store many different types, generally accessed using RESTful HTTP, no schema required, every table must have a primary key, scales fast, lighter in design.