Kettle JDBC driver

The thin Kettle JDBC driver allows a Java client to query a Kettle transformation remotely using JDBC and SQL.

Available from Pentaho Data Integration Enterprise Edition 5.0  or higher.

Architecture

As with most JDBC drivers, there is a server and a client component to the JDBC driver.
The server is designed to run as a Servlet on the Carte server or the Pentaho Data Integration server.

Where are the data service tables coming from?

You can configure a transformation step to serve as a data service in the "Data Service" tab of the transformation settings dialog: 

When such a transformation gets saved there is a new entry created in either the local metastore (under the .pentaho/metastore folder) or in the Enterprise repository on the DI server (/etc/metastore).

As such, Carte or the DI Server will automatically and immediately pick up newly configured data services from the moment your service transformation is saved.
The carte configuration file accepts a <repository> which will also be scanned in addition to the local metastore.

Reminder: the default user/password is stored in the pwd/kettle.pwd file and is cluster/cluster.

Monitoring

During execution of a query, 2 transformations will be executed on the server:

  1. A service transformation, of human design built in Spoon to provide the service data
  2. An automatically generated transformation to aggregate, sort and filter the data according to the SQL query

These 2 transformations will be visible on Carte or in Spoon in the slave server monitor and can be tracked, sniff tested, paused and stopped just like any other transformation.  However, it will not be possible to restart them manually since both transformations are programatically linked.

The JDBC Client

The JDBC driver uses the following class:

org.pentaho.di.core.jdbc.ThinDriver

The URL is in the following format:

jdbc:pdi://hostname:port/kettle?option=value&option=value

For Carte, this is an example:

jdbc:pdi://localhost:8084/kettle?debugtrans=/tmp/jdbc.ktr

For the Data Integration server:

jdbc:pdi://localhost:9080/kettle?webappname=pentaho-di

this example is for a the carte configuration file shown above.
The following standard options are available:

Parameters for the service transformation can be set with the following format:  PARAMETER_name=value (so with the option name prepended with "PARAMETER_")

SQL Support

Support for the SQL is minimal at the moment.

The following things are supported, please consider everything else unsupported:

Literals: 

Limitations

Besides the obviously plentiful limitations in the support for the SQL standard, there are a few noteworthy things to note:

Configuring clients

Clients typically need to following libraries to work:

SQuirreL SQL

Since SQuirrel already contains most needed jar files, configuring it simply done by adding kettle-core.jar as a new driver jar file along with Apache Commons VFS 1.0 and scannotation.jar

BIRT

The following jar files need to be added:

Pentaho Report Designer

Simply replace the kettle-*.jar files in the lib/ folder with new files from Kettle v5.0-M1 or higher.

Pentaho Schema Workbench

Replace the current kettle-*.jar files with the ones from Kettle v5 or later.

Pentaho Interactive Reporting

Interactive reporting runs off Pentaho Metadata so this advice also works there.

You need a BI Server that uses the PDI 5.0 jar files or you can use an older version and update the kettle-core, kettle-db and kettle-engine jar files in the /tomcat/webapps/pentaho/WEB-INF/lib/ folder

Pentaho Analyses (Mondrian): Analyzer / Saiku / JPivot

See Pentaho Interactive reporting: simply update the kettle-*.jar files in your Pentaho BI Server (tested with 4.1.0 EE and 4.5.0 EE) to get it to work.

Example of patching :

matt@kettle:~/pentaho/4.5.0-ee/server/biserver-ee$ rm ./tomcat/webapps/pentaho/WEB-INF/lib/kettle-core-4.3.0-GA.jar
matt@kettle:~/pentaho/4.5.0-ee/server/biserver-ee$ rm ./tomcat/webapps/pentaho/WEB-INF/lib/kettle-engine-4.3.0-GA.jar
matt@kettle:~/pentaho/4.5.0-ee/server/biserver-ee$ rm ./tomcat/webapps/pentaho/WEB-INF/lib/kettle-db-4.3.0-GA.jar
matt@kettle:~/pentaho/4.5.0-ee/server/biserver-ee$ cp /kettle/5.0/lib/kettle-core.jar ./tomcat/webapps/pentaho/WEB-INF/lib/
matt@kettle:~/pentaho/4.5.0-ee/server/biserver-ee$ cp /kettle/5.0/lib/kettle-db.jar ./tomcat/webapps/pentaho/WEB-INF/lib/
matt@kettle:~/pentaho/4.5.0-ee/server/biserver-ee$ cp /kettle/5.0/lib/kettle-engine.jar ./tomcat/webapps/pentaho/WEB-INF/lib/

Screen shot:

Fun fact: Mondrian generates the following SQL for the report shown above:

select "Service"."Category" as "c0", "Service"."Country" as "c1", sum("Service"."sales_amount") as "m0" from "Service" as "Service" group by "Service"."Category", "Service"."Country"

Kettle

You can query a remote service transformation with any Kettle v5 or higher client.  You can query the service through the database explorer and the various database steps (for example the Table Input step).

DataCleaner

TODO: ask project owners to change the current old driver class to the new thin one.

Jaspersoft iReport Designer

Partial success as I'm getting some XML parsing errors.  However, adding the aforementioned jar files at least allow you to get back query fields:

To be investigated.

QlikView

http://tiqview.tumblr.com/post/29820190073/stream-data-from-pentaho-kettle-into-qlikview-via-jdbc

Future

The following things are next on the agenda: