Amazon EMR | Developing Flashcards

1
Q

How many clusters can I run simultaneously?

Developing

Amazon EMR | Analytics

A

You can start as many clusters as you like. You are limited to 20 instances across all your clusters. If you need more instances, complete the Amazon EC2 instance request form and your use case and instance increase will be considered. If your Amazon EC2 limit has been already raised, the new limit will be applied to your Amazon EMR clusters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Where can I find code samples?

Developing

Amazon EMR | Analytics

A

Check out the sample code in these Articles and Tutorials.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do I develop a data processing application?

Developing

Amazon EMR | Analytics

A

You can develop a data processing job on your desktop, for example, using Eclipse or NetBeans plug-ins such as IBM MapReduce Tools for Eclipse (http://www.alphaworks.ibm.com/tech/mapreducetools). These tools make it easy to develop and debug MapReduce jobs and test them locally on your machine. Additionally, you can develop your cluster directly on Amazon EMR using one or more instances.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the benefit of using the Command Line Tools or APIs vs. AWS Management Console?

Developing

Amazon EMR | Analytics

A

The Command Line Tools or APIs provide the ability to programmatically launch and monitor progress of running clusters, to create additional custom functionality around clusters (such as sequences with multiple processing steps, scheduling, workflow, or monitoring), or to build value-added tools or applications for other Amazon EMR customers. In contrast, the AWS Management Console provides an easy-to-use graphical interface for launching and monitoring your clusters directly from a web browser.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Can I add steps to a cluster that is already running?

Developing

Amazon EMR | Analytics

A

Yes. Once the job is running, you can optionally add more steps to it via the AddJobFlowSteps API. The AddJobFlowSteps API will add new steps to the end of the current step sequence. You may want to use this API to implement conditional logic in your cluster or for debugging.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Can I run a persistent cluster?

Developing

Amazon EMR | Analytics

A

Yes. Amazon EMR clusters that are started with the –alive flag will continue until explicitly terminated. This allows customers to add steps to a cluster on demand. You may want to use this to debug your application without having to repeatedly wait for cluster startup. You may also use a persistent cluster to run a long-running data warehouse cluster. This can be combined with data warehouse and analytics packages that runs on top of Hadoop such as Hive and Pig.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Can I be notified when my cluster is finished?

Developing

Amazon EMR | Analytics

A

You can sign up for up Amazon SNS and have the cluster post to your SNS topic when it is finished. You can also view your cluster progress on the AWS Management Console or you can use the Command Line, SDK, or APIs get a status on the cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What programming languages does Amazon EMR support?

Developing

Amazon EMR | Analytics

A

You can use Java to implement Hadoop custom jars. Alternatively, you may use other languages including Perl, Python, Ruby, C++, PHP, and R via Hadoop Streaming. Please refer to the Developer’s Guide for instructions on using Hadoop Streaming.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What OS versions are supported with Amazon EMR?

Developing

Amazon EMR | Analytics

A

At this time Amazon EMR supports Debian/Squeeze in 32 and 64 bit modes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Can I view the Hadoop UI while my cluster is running?

Developing

Amazon EMR | Analytics

A

Yes. Please refer to the Hadoop UI section in the Developer’s Guide for instructions on how to access the Hadoop UI.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Does Amazon EMR support third-party software packages?

Developing

Amazon EMR | Analytics

A

Yes. The recommended way to install third-party software packages on your cluster is to use Bootstrap Actions. Alternatively you can package any third party libraries directly into your Mapper or Reducer executable. You can also upload statically compiled executables using the Hadoop distributed cache mechanism.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which Hadoop versions does Amazon EMR support?

Developing

Amazon EMR | Analytics

A

For the latest versions supported by Amazon EMR, please reference the documentation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Does Amazon contribute Hadoop improvements to the open source community?

Developing

Amazon EMR | Analytics

A

Yes. Amazon EMR is active with the open source community and contributes many fixes back to the Hadoop source.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Does Amazon EMR update the version of Hadoop it supports?

Developing

Amazon EMR | Analytics

A

Amazon EMR periodically updates its supported version of Hadoop based on the Hadoop releases by the community. Amazon EMR may choose to skip some Hadoop releases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly