Added by Matt Casters, last edited by Matt Casters on Mar 30, 2012  (view change)

Labels:

Enter labels to add to this page:
Wait Image 
Looking for a label? Just start typing.

Description

This step allows you to enter User Defined Java Class to drive the functionality of a complete step.  In essence, this step allows you to program your own plugin in a step.

The goal of the "User Defined Java Class" step is not to allow a user to do full-scale Java development inside of a step.  Obviously we have a whole plugin system available to help with that part. (see: wiki.pentaho.com/display/EAI/The+PDI+SDK)
The goal is to allow users to define methods and logic with as little as code as possible, executed as fast as possible.  For this we use the Janino project libraries that compile Java code in the form of classes at runtime.  

Not 100% Java

The first thing to know is that Janino and as a consequence this step doesn't need the complete Java class, only the class body: the imports, constructors and methods you need.  So to drive the point home, the step doesn't need the full class declaration.  The developers of this step selected this approach over the definition of the full class since it was possible to hide a lot of technical details and methods from the user this way.

Kettle adds the following imports:

  • org.pentaho.di.trans.steps.userdefinedjavaclass.*
  • org.pentaho.di.trans.step.*
  • org.pentaho.di.core.row.*
  • org.pentaho.di.core.*
  • org.pentaho.di.core.exception.*

If you need others you need to include them yourself at the very top of your code, for example:

import java.util.*;

Another thing to note is that Janino, essentially a Java byte-code generator only supports a sub-set of the Java 1.5 specification.  To see a complete list of the features and limitations, please go to the Janino homepage.  At the time of writing the most apparent limitation is the absence of generics.

Again, if you need to do a lot of Java development we advice you do this in a Java IDE like Eclipse, not inside this step. You can always expose your Java code to this step by throwing it in a jar file and by placing that library the classpath of Kettle (try the libext/ folder).

Input fields

Most of the time, working with input and output fields is the most important thing you'll be doing in your UDJC code.  As such, there are a number of ways to handle the manipulation of fields.  To start with let's look at the description of the input row:

RowMetaInterface inputRowMeta = getInputRowMeta();

The "inputRowMeta" object contains the metadata of the input row.  This includes all the fields, their data types, lengths, names, format masks and much more.  You can use this to look up input fields and much more.  For example, if you want to look for a field called named "customer" you use the following code:

ValueMetaInterface customer = inputRowMeta.searchValueMeta("year");

Because looking up field names is slow if you need to do it for every row that passes through a transformation, we advice you to look up field names in advance in a first block like this (in the processRow() method):

if (first) {
 yearIndex = getInputRowMeta().indexOfValue(getParameter("YEAR"));
 if (yearIndex<0) {
   throw new KettleException("Year field not found in the input row, check parameter 'YEAR'\!");
 }
}

To get your hands on the Integer value contained in field "year" you can then use the following construct:

Object[] r = getRow();
...
Long year = inputRowMeta().getInteger(r, yearIndex);

To make this process easier you can use a shortcut in this form:

Long year = get(Fields.In, "year").getInteger(r); 

This method will also take into account the index based optimization mentioned above.

IMPORTANT: The Java data types that you get from previous steps always corresponds to the Kettle data type as described on the PDI Rows Of Data page.

Output fields

You can define all the new fields you want in the output of the step in the "Fields" section of the steps dialog:
Doing this will automatically calculate the layout of the output row metadata and store it in "data.outputRowMeta".  That in turn allows you to create the output row.  In case the step writes as many (or less) rows as it reads, you can simply resize the row you get on input:

Object[] outputRowData = RowDataUtil.resizeArray(r, data.outputRowMeta.size());

or more memorable:

Object[] outputRowData = createOutputRow(r, data.outputRowMeta.size());

If rows are being copied make sure to create separate copies to prevent subsequent steps from modifying the same Object[] copy many times at once:

Object[] outputRowData = RowDataUtil.createResizedCopy(r, data.outputRowMeta.size());

Similar to accessing input fields, output fields can be addressed through the index in the output row or using the field helper.

Using the index you can set a value like this:

outputRowData[getInputRowMeta().size()] = easterDate(year.intValue());

or like this with the shortcut:

get(Fields.Out, "easter").setValue(r, easterDate(year.intValue());

IMPORTANT: The Java data types that you pass on to next steps always needs to correspond to the Kettle data type as described on the PDI Rows Of Data page.

Data types

 or pass on to next steps can't be just anything but needs to correspond to

Parameters

Because it is not a very good practice to hard-code string values like field-names (for example "customer" in the paragraph above) we allow the usage of parameters in this step:

In this example, taken from your Kettle distribution file "samples/transformations/User Defined Java Class - Calculate the date of Easter.ktr", we have a parameter called YEAR that is referenced with the getParameter() method, for example:

getParameter("YEAR")

At runtime this will return the "year" String value.

Processing rows

The processRow() method is the heart of the step.  This method is called by the transformation in a tight loop and will continue until false is returned.  A very simple example that calculates firstname+" "+lastname and stores it into a "name" field is this:

String firstnameField;
String lastnameField;
String nameField;

public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws KettleException
{
    // First, get a row from the default input hop
    //
    Object[] r = getRow();

    // If the row object is null, we are done processing.
    //
    if (r == null) {
      setOutputDone();
      return false;
    }

    // Let's look up parameters only once for performance reason.
    //
    if (first) {
      firstnameField = getParameter("FIRSTNAME_FIELD");
      lastnameField = getParameter("LASTNAME_FIELD");
      nameField = getParameter("NAME_FIELD");
      first=false;
    }

    // It is always safest to call createOutputRow() to ensure that your output row's Object[] is large
    // enough to handle any new fields you are creating in this step.
    //
    Object[] outputRow = createOutputRow(r, data.outputRowMeta.size());

    String firstname = get(Fields.In, firstnameField).getString(r);
    String lastname = get(Fields.In, lastnameField).getString(r);

    // Set the value in the output field
    //
    String name = firstname+" "+lastname;
    get(Fields.Out, nameField).setValue(outputRow, name);

    // putRow will send the row on to the default output hop.
    //
    putRow(data.outputRowMeta, outputRow);

    return true;
}

Examples

Look int the samples/transformations folder of your Kettle/PDI distribution for files starting with "User Defined Java Class" like "User Defined Java Class - Calculate the date of Easter.ktr".

There is an error in the next to last statement in the example.  Having set the value of name in "ouputRow", instead of writing r, one must write outputRow.

putRow(data.outputRowMeta, outputRow);

This error also exists in the sample "User Defined Java Class - Concatenate firstname and lastname" as included with 4.2.0-Stable and perhaps others.

Comment: Posted by Lon Amick at Nov 30, 2011 15:58

Thanks Lon!

Comment: Posted by Matt Casters at Mar 30, 2012 16:19