Amazon EMR | Using EBS Volumes Flashcards
Can I edit tags directly on the Amazon EC2 instances?
Using EBS Volumes
Amazon EMR | Analytics
Yes, you can add or remove tags directly on Amazon EC2 instances that are part of an Amazon EMR cluster. However, we do not recommend doing this, because Amazon EMR’s tagging system will not sync the changes you make to an associated Amazon EC2 instance directly. We recommend that tags for Amazon EMR clusters be added and removed from the Amazon EMR console, CLI, or API to ensure that the cluster and its associated Amazon EC2 instances have the correct tags.
What can I do now that I could not do before?
Using EBS Volumes
Amazon EMR | Analytics
Most EC2 instances have fixed storage capacity attached to an instance, known as an “instance store”. You can now add EBS volumes to the instances in your Amazon EMR cluster, allowing you to customize the storage on an instance. The feature also allows you to run Amazon EMR clusters on EBS-Only instance families such as the M4 and C4.
What are the benefits of adding EBS volumes to an instance running on Amazon EMR?
Using EBS Volumes
Amazon EMR | Analytics
You will benefit by adding EBS volumes to an instance in the following scenarios:
Your processing requirements are such that you need a large amount of HDFS (or local) storage that what is available today on an instance. With support for EBS volumes, you will be able to customize the storage capacity on an instance relative to the compute capacity that the instance provides. Optimizing the storage on an instance will allow you to save costs.
You are running on an older generation instance family (such as the M1 and M2 family) and want to move to latest generation instance family but are constrained by the storage available per node on the next generation instance types. Now you can use any of the new generation instance type and add EBS volumes to optimize the storage. Internal benchmarks indicate that you can save cost and improve performance by moving from an older generation instance family (M1 or M2) to a new generation one (M4, C4 & R3). The Amazon EMR team recommends that you run your application to arrive at the right conclusion.
You want to use or migrate to the next-generation EBS-Only M4 and C4 family.
Can I persist my data on an EBS volume after a cluster is terminated?
Using EBS Volumes
Amazon EMR | Analytics
Currently, Amazon EMR will delete volumes once the cluster is terminated. If you want to persist data outside the lifecycle of a cluster, consider using Amazon S3 as your data store.
What kind of EBS volumes can I attach to an instance?
Using EBS Volumes
Amazon EMR | Analytics
Amazon EMR allows you to use different EBS Volume Types: General Purpose SSD (GP2), Magnetic and Provisioned IOPS (SSD).
What happens to the EBS volumes once I terminate my cluster?
Using EBS Volumes
Amazon EMR | Analytics
Amazon EMR will delete the volumes once the EMR cluster is terminated.
Can I use an EBS with instances that already have an instance store?
Using EBS Volumes
Amazon EMR | Analytics
Yes, You can add EBS volumes to instances that have an instance store.
Can I attach and EBS volume to a running cluster?
Using EBS Volumes
Amazon EMR | Analytics
No, currently you can only add EBS volumes when launching a cluster.
Can I snapshot volumes from a cluster?
Using EBS Volumes
Amazon EMR | Analytics
The EBS API allows you to Snapshot a cluster. However, Amazon EMR currently does not allow you to restore from a snapshot.
Can I use encrypted EBS volumes?
Using EBS Volumes
Amazon EMR | Analytics
No, encrypted volumes are not supported in the current release.