Advanced EC2 Flashcards by Gerardo Barboza

Bootstrapping EC2 using User Data

Bootstrapping is a process where scripts or other bits of configuration can be run, when an instance is first launched, meaning that an instance, can be brought into service in a certain pre-configured state.

-Allows EC2 Build Automation

-User Data - Accessed via the meta-data IP

-http://169.254.169.254/latest/user-data

-User Data - Is a piece of data that you can pass into an EC2 instance

-Anything in User Data is executed by the instance OS (ONLY AT LAUNCH)

-EC2 doesn’t interpret, the OS needs to understand the User Data

How well did you know this?

Not at all

Perfectly

User Data - Key Points

-It’s opaque to EC2 - It’s just a block of data

-It’s NOT secure - don’t use it for passwords or long term credentials (Anyone with access to the O.S can see the User Data)

-Limited to 16 KB in size

-Can be modified when instance stopped

-Only executed one at launch

How well did you know this?

Not at all

Perfectly

Boot-Time-To-Service-Time

How quickly you can bring an instance to a service.

AMI > minutes > Instance

-Post Launch Time = Time required after launch, for you to perform manual configuration or automatic configuration before the instance is ready for service

-You can do it quickly using Bootstrapping
-You can Bake an AMI

The optimal way is to combine both of these processes: AMI Baking and Bootstrapping

-You’d use AMI baking for any part of the process, which is time intensive, for example, if you have an application which is 90% installation (AMI baking) and 10% configuration (Bootstrapping).

That way, you reduce the post launch time and thus the boot-time-to-service-time.

How well did you know this?

Not at all

Perfectly

Enhanced Bootstrapping with CFN-INIT (AWS::CloudFormation::Init)

Is a way that you can pass complex bootstrapping instructions into an EC2 instance.

-cfn-init helper script - installed on EC2 OS

-Simple configuration management system

-Procedural (User Data) - Run by the O.S line by line

-Desired State (cfn-init)

Where you direct it how you want something to be, it will perform whatever is required to move the instance into that desired state. For example, you can tell cfn-init, that you want a certain version of the Apache web server to be installed and, if that’s already the case, and it’s the same version then nothing is done. If Apache is not installed, then cfn-init will install it or it will update any older versions to that version.

-It can make sure Packages are installed,
-It can manipulate O.S Groups and Users,
-It can download Sources and extract them onto the local instance, even using authentication,
-It can create files with certain contents, permissions and ownerships
-It can run Commands and test that certain conditions are true, after the commands are run
-It can control Services on an instance, ensuring that a particular service is started or enabled.

-Provided with directives via Metadata and AWS::CloudFormation::Init on a CFN resource

-It can be configured to watch for updates to the metadata, on a object in a template. If that metadata changes, then cfn-init can be executed again and it will update the configuration of that instance to the desired state specified inside the template

How well did you know this?

Not at all

Perfectly

CreationPolicy and Signals

A CreationPolicy is something which is added to a logical resource inside a CF template. You create it and you supply a timeout value.

This is used to create a stack which creates an instance, at this point CF waits, it doesn’t move the instance into a create complete status when EC2 signals, that it has been created successfully. Instead, it waits for a signal, a signal from the resource itself.

-The cfn-signal command understands, how to communicate with the specific CF stack that it’s running inside

-If the output of the “desired state configuration command” is an okay state, then the okay is sent as a signal by cfn-signal

-If cfn-init reports an error code, then this is sent using cfn-signal to the CF stack

-So cfn-signal is reporting to CF, the success or not of the cfn-init bootstrapping and this is reported to the CF stack.

-If it’s success code, so if cfn-init worked as intended, then the resource is moved to a “Create Complete State”

-If cfn-signal reports an error, the resourcing CF shows an error

-If nothing happens for the time period that you specified on the CF stack, then CF assumes, it’s errored and doesn’t let the stack create successfully

How well did you know this?

Not at all

Perfectly

EC2 Instance Roles

Allowing a service to assume a role, grants the service the permisions that the role has.

EC2 instance roles are roles that an instance can assume, and anything running in that instance has the permissions that the role grants.

Architecture

IAM Role with a Permissions Policy attached - So whoever assumes the role gets temporary credentials generated, and those give the permissions, that the policy has.

-Means that an EC2 instance itself can assume it and gain access to those credentials. So that the application running inside that instance, can use the permissions that the role provides.

There’s an intermediate piece of architecture, called the “Instance Profile”, and this is a wrapper around an IAM role. This is the thing, that allows the permissions to get inside the instance.

-When you create an Instance Role, an Instance Profile, is created with the same name.
-If you use the command line or CF, you need to create these two things separately

-When using the UI, and you think you’re attaching an instance role direct to an instance, YOU’RE NOT, you are attaching an instance profile of the same name. It’s the Instance Profile that’s attached to an EC2 instance.

-Inside an EC2 instance, temporary credentials are delivered via the Instance Metadata **

-EC2 and the secure token service liaise with each other, to ensure that the credentials are always renewed, before they expire

How well did you know this?

Not at all

Perfectly

EC2 Instance Roles - Summary

-Credentials are inside meta-data

-Inside the meta-data, there’s an IAM tree = iam/security-credentials/role-name

-Automatically rotated - Always valid

-Should always be used rather than adding access keys into instance

-CLI tools will use ROLE credentials automatically

How well did you know this?

Not at all

Perfectly

AWS Systems Manager Parameter Store (SSM Parameter Store)

A service from AWS, which makes it easy to store various bits of systems configuration (strings, documents, secrets), and store those in a resilient, secure, and scalable way.

Remember that passing secrets into an EC2 instance using UserData was bad practice, because anyone with access to the instance could access all of that data. Parameter Store is a way that this can be improved.

-Lets you create parameters (they have a name and a value, this is the part that stores the actual configuration)

-Storage for configuration & secrets

-Offers the ability to store 3 different types of parameters: String, StringList & SecureString

-Using these different types of parameters, you can store things inside the product such as: Lincense codes, Database Strings, Full Configs & Passwords.

-Allows you to store parameters using a Hierarchical structure & stores different versions of parameters (Versioning)

-Allows you to store Plaintext and Ciphertext (Integrates with KMS to encrypt parameters)

-Public Parameters - Uses Latest AMIs per region

How well did you know this?

Not at all

Perfectly

SSM Parameter Store - Architecture

-Different types of things can use the Parameter Store: EC2 Instances (and all things inside of it), Applications and Lambda Functions

-They can all request access to parameters inside the Parameter Store

-Parameter Store is integrated with IAM for permissions, so any accesses will need to be authenticated and authorized and that might use long-term credentials (access keys) or short-term passed in via an IAM role

-If parameters are encrypted then KMS will be involved and the apropiate permissions to the CMK inside KMS will also be required

-Parameter Store allows you to create a simple (myDBpassword) or complex sets of parameters (Hierarchical structure, such as /Wordpress/ that has a DBUser, which could be accessed either by using its full name or requesting the Wordpress part of the tree )

-Any changes that occur to any parameters, can create events, and these events can also start processes in other AWS services.

How well did you know this?

Not at all

Perfectly

System and Application Logging on EC2

-CloudWatch is for metrics

-ClouwWatch Logs is for logging

-Neither natively capture data inside an Insatance

-CloudWatch Agent is required

-To function, it needs to have the configuration and permissions

How well did you know this?

Not at all

Perfectly

CloudWatch Agent - Architecture

-We need to supply a configuration to the Agent, so that he knows what to do

-The Agent also needs someway of interacting with AWS on permissions - IAM Role with permissions to interact with CW Logs

-The Agent Config also configures the metrics and the logs, we want to capture, and these are all injected into CW using Log Groups.

-We’l configure one log group for every log file, that we want to inject into the product and then within each Log Group, there’ll be a Log Stream, for each instance performing this logging.

-You can install it manually or use CF (automation) to include that Agent configuration for every instance that you provision

-CW Agent comes with a number of ways to obtain and store the configuration, that it will use to send this data into CW Logs

One of those ways is we can actually use the Parameter Store and store the Agent configuration as a parameter.

How well did you know this?

Not at all

Perfectly

EC2 Placement Groups

When you launch an EC2 instance, it’s physical location is selected by AWS, placing it on whatever EC2 host makes the most sense within the AZ, that’s launched in.

-Placement Groups allow you to influence placement - Ensuring that instances are either physically close together or not

There are currently 3 types of placement groups for EC2:

All of them influence how instances are arranged on physical hardware, but each of them do it for different underlying reasons.

-Cluster placement - Ensure that any instances in a single cluster placement group are physically close together. (Pack instances close together)

-Spread placement - Ensure that instances are all using different underlying hardware (Keep instances separated)

-Partition - Designed for distributed and replicated applications, which have infrastructure awareness. So where you want groups of instances but where each group is different hardware. (Group of instances spread apart)

How well did you know this?

Not at all

Perfectly

Cluster Placement Groups

-Are used where you want to achieve the absolute highest level of performance possible within EC2

-With Cluster, you create a group and best practice is that you launch all of the instances, in this group at the same time - This ensures that AWS allocate capacity for everything that you require

-They have to be launched in a single AZ

-When you create a placement group, you don’t specify an AZ, instead when you launch the first instance or instances, into that placement group, it will lock that placement group to whichever AZ that instance it’s also launched into.

-Generally use the same Rack, but often the same EC2 Host

-All the instances within a placement group, have fast direct bandwidth to all other instances inside it. (All members have direct connections to each other)

-When transferring data between instances within that Cluster placement group, they can achieve single stream transfer rates of 10Gbps p/ stream vs the usual 5GB per second

-Because of the physical placement, they’re the Lowest Latency Possible and the maximum packets per second (PPS) possible in AWS. (To achive these levels of performance, you’ll need instances with high performance networking) (also use Enhanced Networking)

Cluster Placement Groups are used when you REALLY need performance, that needed to achieve the highest levels of thoughput and the lowest consistent latencies within AWS.

But the trade-off is because of the physical location, if the hardware that they’re running on fails, logically it could take down all the instances within that cluster placement group.

-Cluster Placement Groups offer little to no resilience

How well did you know this?

Not at all

Perfectly

Cluster Placement Groups - KEY POINTS

-Can’t span AZs - ONE AZ ONLY - locked when launching first instance

-Can span VPC peers - but impacts performance

-Requires a supported instance type

-Use the same type of instance (not mandatory)

-Launch at the same time (not mandatory… very recommended)

-10Gbps single stream performance

-Use case: Performance, fast speeds, low latency

How well did you know this?

Not at all

Perfectly

Spread Placement Groups

-Designed to ensure the maximum amount of availability and resilience for an application

-Can span multiple AZs

-Instances which are placed into a spread placement group, are located on separate isolated infrastructure racks, within each AZ.

-Each instance has its own isolated networking and power supply, separate from any of the other instances, also within that same spread placement group. (if a single rack fails, either from a networking or a power perspective, the fault can be isolated to one of those racks)

-Limit of 7 Instances per AZ - Because each instance is in a completely separate infrastructure rack and there are limits on the number of these within each AZ

How well did you know this?

Not at all

Perfectly

Spread Placement Groups - KEY POINTS

Study These Flashcards

-Provides infrastructure isolation

-Each INSTANCE runs from a different rack

-Each rack has its own network and power source

-7 Instances per AZ (HARD LIMIT)

-Not supported for Dedicated Instances/Hosts

-Use case: Small number of critical instances that need to be kept separated from each other

Partition Placement Groups

Study These Flashcards

-Are designed for when you have infrastructure where you have more than seven instances per AZ, but you still need the ability to separate those instances into separate fault domains.

-Can be created across multiple AZs in a region.

-When you are creating a partition placement group, you specify a number of partitions with a MAXIMUM of 7 per AZ in that region.

-Each partition inside the placement group, has it’s own rack with isolated power and networking - There is a guatantee of NO SHARING infrastructure

-You can launch as many instances as you need into the group and you can either select the partition explicitly, or have EC2 make that decision on your behalf

-Partition Placement Groups are designed for huge scale parallel processing systems, where you need to create groupings of instances and have them separated

-They offer visibility into the partitions - You can see which instances are in which partitions and you can share this information with topology aware applications, such as HDFS, HBase and Cassandra

Partition Placement Groups VS Spread Placement Groups

Study These Flashcards

With Spread Placement Groups, you had a MAXIMUM of 7 instances per AZ, and you knew a 100%, that each instance within that placement group were separated from every other instance in terms of hardware

With Partition Placement Groups, each partition is isolated, but you get to control which partition to launch instances into.

-If you launch 10 instances, and the partition fails, then ALL the instances FAIL.
-If you launch 7 instances and put one into each separate partition, then it behaves very much like a spread placement group.

So the difference between spread and partition placement is that with spread placement, it’s all handled for you but you have that 7 instance per AZ limit.

But with partition placement group, you can have more instances, but you or your application, which has a topology aware needs to administer the partition placement.

Partition Placement Groups - KEY POINTS

Study These Flashcards

-7 Partitions per AZ

-Instances can be placed in a specific partition

-or you can allow EC2 to automatically control that placement

-Great for topology aware applications, such as HDFS, HBase, and Cassandra

-Can help to contain the impact of failure to part of an application

EC2 Dedicated Hosts

Study These Flashcards

A feature of EC2, which allows you to gain access to hosts dedicated for your use, which you can then use to run EC2 instances.

-EC2 Host dedicated to you

-You pay for the Host itself, which is designed for a specific family of instances. (a1, c5, m5….)

-No instance charges

-You can pay a Host - On-Demand & Reserved Options available

-Host hardware has physical sockets and cores

It dictates how many instances can be run on that host and software which is licensed, based on physical sockets or cores, can utilize this vilibility of the hardware. (Some enterprise software is licensed based on the number of physical sockets or cores)

-With nitro-based dedicated hosts - Allows you to launch multiple instance types on the same dedicated host (Flexibility)

Limitations & Features - Dedicated Hosts

Study These Flashcards

-AMI Limits - RHEL, SUSE Linux, and Windows AMIs aren’t supported

-Amazon RDS instances are not supported

-Placement Groups are not supported

-Hosts can be shared with otehr ORG accounts, using the RAM product, which is the resource access manager

Enhanced Networking & EBS Optimized

Study These Flashcards

Both provide massive benefits to the way EC2 performs and they support other performance features within EC2, such as placement groups.

Enhanced Networking

Study These Flashcards

Is a feature which is designed to improve the overall performance of EC2 Networking. It’s a feature which is required for any high-end performance features, such as cluster placement groups.

-Uses SingleRoot-IOVirtualization (SR-IOV) - NIC is virtualization away

SR-IOV makes a physical network interface inside an EC2 host, is a aware of virtualization

-Higher I/O & Lower Host CPU Usage

-More Bandwidth - It allows for much more networking speed, because it can scale and it doesn’t impact the host CPU.

-Higher packets-per-second (PPS)

-Consistent lower latency - Because the host CPU isn’t really involved, it’s offloaded to the physical network interface card

-Is a feature either enabled by default or available for no charge

No Enhanced Networking vs Enhanced Networking

Study These Flashcards

-No Enhanced Networking

In this case, we have two EC2 instances, each of them using one virtual network interface, and both of these virtual network interfaces, talk back to the EC2 Host and each of them use the hosts single network interface.

The physical network interface isn’t aware of virtualization, so the host has to sit in the middle, controlling which instance has access to the physical card at one time. It’s slower and it consumes a lot of host CPU

-Enhanced Networking

The host has network interface cards, which are aware of virtualization. Instead of presenting themselves as single physical network interface cards which the host needs to manage, it offers what you can think of logical cards, multiple logical cards per physical card. Each instance is given exclusive access to one of those logical cards, and it sends data to this the same as it would do, if it did have it’s own dedicated physical card.

The physical card handles this process end to end without consuming mass amounts of host CPU.

EBS Optimized

-EBS = Block storage over the network -Historically network was shared with the same network stack being used for both, data networking and EBS storage networking -EBS Optimized means dedicated capacity for EBS - It means that faster speeds are possible with EBS and the storage doesn't impact the data performance and vice versa -Most instances support it and have enabled by default (no extra charge) -In older instances, it's supported, but enabling costs extra

Advanced EC2 Flashcards

(25 cards)