This job entry executes Hadoop jobs on an Amazon Elastic MapReduce (EMR) account. In order to use this step, you must have an Amazon Web Services (AWS) account configured for EMR, and a premade Java JAR to control the remote job.
The name of this Amazon EMR Job Executer step instance.
EMR Job Flow Name
The name of the Amazon EMR job flow (series of steps) you are executing.
Existing Job Flow ID
Indicates the ID for the existing job flow. This field is optional.
AWS Access Key
Your Amazon Web Services access key.
AWS Secret Key
Your Amazon Web Services secret key.
S3 Staging Directory
The Amazon Simple Storage Service (S3) address of the working directory for this Hadoop job. This directory will contain the MapReduce JAR, and log files will be placed here as they are created.
The Java JAR that contains your Hadoop mapper and reducer classes. The job must be configured and submitted using a static main method in any class in the JAR.
Command line arguments
Any command line arguments that must be passed to the static main method in the specified JAR.
Number of Instances
The number of Amazon Elastic Compute Cloud (EC2) instances you want to assign to this job.
Master Instance Type
The Amazon EC2 instance type that will act as the Hadoop "master" in the cluster, which handles MapReduce task distribution.
Slave Instance Type
The Amazon EC2 instance type that will act as one or more Hadoop "slaves" in the cluster. Slaves are assigned tasks from the master. This is only valid if the number of instances is greater than 1.
Forces the job to wait until each step completes before continuing to the next step. This is the only way for PDI to be aware of a Hadoop job's status. If left unchecked, the Hadoop job is blindly executed, and PDI moves on to the next step. Error handling/routing will not work unless this option is checked.
Number of seconds between log messages.