Added by Jens Bleuel, last edited by Jens Bleuel on Nov 04, 2008  (view change)

Labels:

Enter labels to add to this page:
Wait Image 
Looking for a label? Just start typing.

In General

Pentaho Kettle Component

This component shows a good example how to call a transformation or a job from another Java program.

It can be found in the package org.pentaho.plugin.kettle in the Pentaho-BI-Server project.

Get the source code here.

Program your own Kettle transformation

Recommendation for upward compatibility: If you want to create your own Transformation dynamically (e.g. from meta-data), use the method of generating a XML-file (KTR) instead of using the API. The XML-files are compatibility from the first Version of Kettle until now. This is the same for Jobs.

The example described below performs the following actions:

  1. create a new transformation
  2. save the transformation as XML file
  3. generate the SQL for the target table
  4. Execute the transformation
  5. drop the target table to make this program repeatable

The complete source code for the example is distributed in the distribution zip file. You can find this file in the downloads section. (Kettle version 2.1.3 or higher) After unzipping this file, you can find the source code in the "TransBuilder.java" file in the "extra" directory.

The Kettle Java API for Kettle can be generated with Javadoc, e.g. in Eclipse: Project / Generate Javadoc or by exporting to Javadoc. When the wizard opens: Set the location of the Javadoc command and enter the absolute path (e.g. C:\Program Files\Java\jdk1.5.0_12\bin\javadoc.exe).

// Generate the transformation.
TransMeta transMeta = TransBuilder.buildCopyTable(
   transformationName,
   sourceDatabaseName,
   sourceTableName,
   sourceFields,
   targetDatabaseName,
   targetTableName,
   targetFields
   );

// Save it as a file:
String xml = transMeta.getXML();
DataOutputStream dos = new DataOutputStream(new FileOutputStream(new File(fileName)));
dos.write(xml.getBytes("UTF-8"));
dos.close();
System.out.println("Saved transformation to file: "+fileName);

// OK, What's the SQL we need to execute to generate the target table?
String sql = transMeta.getSQLStatementsString();

// Execute the SQL on the target table:
Database targetDatabase = new Database(transMeta.findDatabase(targetDatabaseName));
targetDatabase.connect();
targetDatabase.execStatements(sql);

// Now execute the transformation...
Trans trans = new Trans(transMeta);
trans.execute(null);
trans.waitUntilFinished();

// For testing/repeatability, we drop the target table again
targetDatabase.execStatement("drop table "+targetTableName);
targetDatabase.disconnect();






Below is the source code for the method that creates the transformation:

/**
* Creates a new Transformation using input parameters such as the tablename to read from.
* @param transformationName The name of the transformation
* @param sourceDatabaseName The name of the database to read from
* @param sourceTableName The name of the table to read from
* @param sourceFields The field names we want to read from the source table
* @param targetDatabaseName The name of the target database
* @param targetTableName The name of the target table we want to write to
* @param targetFields The names of the fields in the target table (same number of fields as sourceFields)
* @return A new transformation metadata object
* @throws KettleException In the rare case something goes wrong
*/

public static final TransMeta buildCopyTable(String transformationName,
		String sourceDatabaseName, String sourceTableName,
		String[] sourceFields, String targetDatabaseName,
		String targetTableName, String[] targetFields)
		throws KettleException {

	EnvUtil.environmentInit();

	try
	{
		// Create a new transformation...
		//

		TransMeta transMeta = new TransMeta();
		transMeta.setName(transformationName);

		// Add the database connections
		for (int i = 0; i < databasesXML.length; i++) {
			DatabaseMeta databaseMeta = new DatabaseMeta(databasesXML[i]);
			transMeta.addDatabase(databaseMeta);
		}

		DatabaseMeta sourceDBInfo = transMeta.findDatabase(sourceDatabaseName);
		DatabaseMeta targetDBInfo  = transMeta.findDatabase(targetDatabaseName);

		//
		// Add a note
		//

		String note = "Reads information from table [" + sourceTableName+ "] on database [" + sourceDBInfo + "]" + Const.CR;
		note += "After that, it writes the information to table ["+ targetTableName + "] on database [" + targetDBInfo + "]";
		NotePadMeta ni = new NotePadMeta(note, 150, 10, -1, -1);
		transMeta.addNote(ni);

		//
		// create the source step...
		//

		String fromstepname = "read from [" + sourceTableName + "]";
		TableInputMeta tii = new TableInputMeta();
		tii.setDatabaseMeta(sourceDBInfo);
		String selectSQL = "SELECT " + Const.CR;

		for (int i = 0; i < sourceFields.length; i++) {
			if (i > 0) selectSQL += ", "; else selectSQL += " ";
			selectSQL += sourceFields[i] + Const.CR;
		}
		selectSQL += "FROM " + sourceTableName;
		tii.setSQL(selectSQL);

		StepLoader steploader = StepLoader.getInstance();

		String fromstepid = steploader.getStepPluginID(tii);
		StepMeta fromstep = new StepMeta(log, fromstepid, fromstepname,(StepMetaInterface) tii);
		fromstep.setLocation(150, 100);
		fromstep.setDraw(true);
		fromstep.setDescription("Reads information from table [" + sourceTableName + "] on database [" + sourceDBInfo + "]");
		transMeta.addStep(fromstep);

		//
		// add logic to rename fields
		// Use metadata logic in SelectValues, use SelectValueInfo...
		//

		SelectValuesMeta svi = new SelectValuesMeta();
		svi.allocate(0, 0, sourceFields.length);

		for (int i = 0; i < sourceFields.length; i++) {
			svi.getMetaName()[i] = sourceFields[i];
			svi.getMetaRename()[i] = targetFields[i];
		}

		String selstepname = "Rename field names";
		String selstepid = steploader.getStepPluginID(svi);
		StepMeta selstep = new StepMeta(log, selstepid, selstepname, (StepMetaInterface) svi);
		selstep.setLocation(350, 100);
		selstep.setDraw(true);
		selstep.setDescription("Rename field names");
		transMeta.addStep(selstep);

		TransHopMeta shi = new TransHopMeta(fromstep, selstep);
		transMeta.addTransHop(shi);
		fromstep = selstep;

		//
		// Create the target step...
		//

		//
		// Add the TableOutputMeta step...
		//

		String tostepname = "write to [" + targetTableName + "]";
		TableOutputMeta toi = new TableOutputMeta();
		toi.setDatabase(targetDBInfo);
		toi.setTablename(targetTableName);
		toi.setCommitSize(200);
		toi.setTruncateTable(true);

		String tostepid = steploader.getStepPluginID(toi);
		StepMeta tostep = new StepMeta(log, tostepid, tostepname, (StepMetaInterface) toi);
		tostep.setLocation(550, 100);

		tostep.setDraw(true);
		tostep.setDescription("Write information to table [" + targetTableName + "] on database [" + targetDBInfo + "]");
		transMeta.addStep(tostep);

		//
		// Add a hop between the two steps...
		//

		TransHopMeta hi = new TransHopMeta(fromstep, tostep);
		transMeta.addTransHop(hi);

		// The transformation is complete, return it...
		return transMeta;
	} catch (Exception e) {
		throw new KettleException("An unexpected error occurred creating the new transformation", e);
	}
}






Running an existing transformation

If you already have created a transformation and saved it in a .ktr or .xml file, you can use code like the below to run it.

public static void runTransformation(String filename) {
  try {
    StepLoader.init();
    EnvUtil.environmentInit();
    TransMeta transMeta = new TransMeta(filename);
    Trans trans = new Trans(transMeta);

    trans.execute(null); // You can pass arguments instead of null.
    trans.waitUntilFinished();
    if ( trans.getErrors() > 0 )
    {
      throw new RuntimeException( "There were errors during transformation execution." );
    }
  }
  catch ( KettleException e ) {
    // TODO Put your exception-handling code here.
    System.out.println(e);
  }
}

Are there any other resources for learning more about the kettle Java APIs?  Specifically more about how arguments can be set in the transformations and ways to programmitically change step properties after a ktr file has been loaded.

Thanks.

Posted by Anonymous at Aug 14, 2008 13:25

The easiest way to change parameters while keeping a transformation or job unchanged is with the use of variables.

I tried to run an existing transformation in a java project created with eclipse, such as mentionned bellow.

I added the jars kettle-core.jar, kettle-engine.jar and kettle-ui_swt.jar to my project, but the following exception occurs.

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/digester/Rule
    at org.pentaho.di.trans.step.BaseStep.<clinit>(BaseStep.java:76)
    at org.pentaho.di.trans.StepLoader.readNatives(StepLoader.java:124)
    at org.pentaho.di.trans.StepLoader.init(StepLoader.java:109)
    at org.pentaho.di.trans.StepLoader.init(StepLoader.java:119)
    at main.main(main.java:17)

Can you help me?

Thanks in adavance.

Posted by Anonymous at Aug 25, 2008 05:48

try to add commons-digester to your project class path....

Posted by Anonymous at Aug 25, 2008 06:48

i want to be able to create the main body of transformations (like splits, filters, and such) but link the inputs and outputs together at run time through the api.  like big etl lego blocks.  i suppose i can create a job at run time, but there doesn't seem to be any documentation for the job api.  any suggestions?

Posted by Anonymous at Aug 25, 2008 13:36

thanks for your recommandations.

I added commons-digester.jar, and I have this exception:
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/collections/ArrayStack
    at org.apache.commons.digester.Digester.<init>(Digester.java:137)
    at org.pentaho.di.core.config.KettleConfig.createDigester(KettleConfig.java:100)
    at org.pentaho.di.core.config.KettleConfig.<init>(KettleConfig.java:58)
    at org.pentaho.di.core.config.KettleConfig.getInstance(KettleConfig.java:78)
    at org.pentaho.di.trans.step.BaseStep.<clinit>(BaseStep.java:76)
    at org.pentaho.di.trans.StepLoader.readNatives(StepLoader.java:124)
    at org.pentaho.di.trans.StepLoader.init(StepLoader.java:109)
    at org.pentaho.di.trans.StepLoader.init(StepLoader.java:119)
    at main.main(main.java:17)

I added commons-collections-3.1.jar and commons-collections.jar, but the following exception occurs:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory
    at org.apache.commons.digester.Digester.<init>(Digester.java:303)
    at org.pentaho.di.core.config.KettleConfig.createDigester(KettleConfig.java:100)
    at org.pentaho.di.core.config.KettleConfig.<init>(KettleConfig.java:58)
    at org.pentaho.di.core.config.KettleConfig.getInstance(KettleConfig.java:78)
    at org.pentaho.di.trans.step.BaseStep.<clinit>(BaseStep.java:76)
    at org.pentaho.di.trans.StepLoader.readNatives(StepLoader.java:124)
    at org.pentaho.di.trans.StepLoader.init(StepLoader.java:109)
    at org.pentaho.di.trans.StepLoader.init(StepLoader.java:119)
    at main.main(main.java:17)

So I added commons-logging.jar and commons-logging-1.0.4.jar, and the following exception occurs:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/commons/beanutils/DynaProperty
    at org.apache.commons.digester.Digester.addBeanPropertySetter(Digester.java:1731)
    at org.pentaho.di.core.config.KettleConfig.createDigester(KettleConfig.java:102)
    at org.pentaho.di.core.config.KettleConfig.<init>(KettleConfig.java:58)
    at org.pentaho.di.core.config.KettleConfig.getInstance(KettleConfig.java:78)
    at org.pentaho.di.trans.step.BaseStep.<clinit>(BaseStep.java:76)
    at org.pentaho.di.trans.StepLoader.readNatives(StepLoader.java:124)
    at org.pentaho.di.trans.StepLoader.init(StepLoader.java:109)
    at org.pentaho.di.trans.StepLoader.init(StepLoader.java:119)
    at main.main(main.java:17)

I added commons-beanutils.jar, and I have:

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.commons.digester.Digester.parse(Ljava/net/URL;)Ljava/lang/Object;
    at org.pentaho.di.core.config.KettleConfig.<init>(KettleConfig.java:61)
    at org.pentaho.di.core.config.KettleConfig.getInstance(KettleConfig.java:78)
    at org.pentaho.di.trans.step.BaseStep.<clinit>(BaseStep.java:76)
    at org.pentaho.di.trans.StepLoader.readNatives(StepLoader.java:124)
    at org.pentaho.di.trans.StepLoader.init(StepLoader.java:109)
    at org.pentaho.di.trans.StepLoader.init(StepLoader.java:119)
    at main.main(main.java:17)

The jar  commons-digester.jar is already added.  

Can you give me suggessions ?

Thanks

Posted by Anonymous at Aug 26, 2008 04:56

i'm trying to create a TransMeta object from an org.w3c.dom.Node.  i've got the xml w/ a repository root node in a org.w3c.dom.Document object (a subclass of Node) from a ktr file i hand it to the TransMeta(Node, Repository) constructor... and bam... nothing!  am i doing something wrong?

Posted by Anonymous at Aug 27, 2008 15:39

I want to run a job from a repository in a java web application.
I do it with a simple transformation using the example given in this page.
Do you have an example ?

I think I have to instantiate a repository and initialise it, then I call
TransMeta transMeta = new TransMeta(filename,repository); ??

Thanks

Posted by Anonymous at Aug 28, 2008 02:31

I suggest anyone needing to run jobs and/or transformation from Java code to take a look at the Kitchen and Pan sources (Kitchen.java and Pan.java, both included in the Kettle source code). I was able to solve all my problems by looking at the code and extracting the right pieces

If I don't want the output of the log file to appear on the screen but rather be put into a local log
file.How can I do?

Thanks.

Posted by yang at Jan 14, 2009 02:40

Yang
Were you able to figure out how to redirect output to a file?