Hitachi Vantara Pentaho Community Wiki

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The thin Kettle JDBC driver allows a Java client to query a Kettle transformation remotely using JDBC and SQL.

Available from Kettle v5Pentaho Data Integration Enterprise Edition 5.0 -M1 or  or higher. http://ci.pentaho.com/job/Kettle/

PLEASE PROVIDE FEEDBACK WHEN TRYING OUT THE DEVELOPMENT SOFTWARE DESCRIBED BELOW!

Architecture

As with most JDBC drivers, there is a server and a client component to the JDBC driver.
The server is designed to run as a Servlet on the Carte server , or the Pentaho Data Integration server or Pentaho Business Analytics platform.  At the time of writing only Carte is supported.

Image Modified

Where

...

Download Kettle 5.0-M1 or later for the server software (Carte), for now go to CI : http://ci.pentaho.com/job/Kettle/

Recent builds will contain all the libraries mentioned below.  Kettle jars are in the lib/ folder, the rest in libext/, libext/commons and libext/pentaho.

Server configuration

The carte configuration file accepts a <services> block that can contain <service> elements with the following sub-elements:

  • name : The name of the service.  Only alphanumeric characters are supported at the moment, no spaces.
  • filename: The filename of the service transformation (.ktr) that will provide the data for the service
  • service_step: the name of the step which will provide data during querying.

For example:

Code Block

<slave_config>  

  <slaveserver>    
    <name>slave4-8084</name>    
    <hostname>localhost</hostname>    
    <port>8084</port>    
  </slaveserver>  

  <services>  
    <service>    
      <name>Service</name>     
      <filename>/path/to/your/service-transformation.ktr</filename>     
      <service_step>Output</service_step>   
    </service>
  </services>

</slave_config>

You can start your Carte server passing the carte configuration xml as the only argument, for example:

...

are the data service tables coming from?

You can configure a transformation step to serve as a data service in the "Data Service" tab of the transformation settings dialog:  Image Added

When such a transformation gets saved there is a new entry created in either the local metastore (under the .pentaho/metastore folder) or in the Enterprise repository on the DI server (/etc/metastore).

As such, Carte or the DI Server will automatically and immediately pick up newly configured data services from the moment your service transformation is saved.
The carte configuration file accepts a <repository> which will also be scanned in addition to the local metastore.

Reminder: the default user/password is stored in the pwd/kettle.pwd file and is cluster/cluster.

...

jdbc:pdi://hostname:port/kettle?option=value&option=value

for For Carte, this is an example:

jdbc:pdi://localhost:8084/kettle?debugtrans=/tmp/jdbc.ktr

For the Data Integration server:

jdbc:pdi://localhost:9080/kettle?webappname=pentaho-di

this example is for a the carte configuration file shown above.
The following standard options are available:

  • webappname : the name of the web app (future feature to support running typically pentaho-di on the DI server)
  • proxyhostname : the proxy server for the HTTP connection(s)
  • proxyport : the port of the proxy server
  • nonproxyhosts : the hosts (comma seperator) for which not to use a proxy
  • debugtrans : the optional name of a file in which the generated transformation will be stored for debugging purposes (example: /tmp/debug.ktr)
  • debuglog : set to "true" to have the logging text of the remote SQL transformation will be written to the general logging channel once execution is finished.

...

  • Grouping is done using a "Memory Group By" step which keeps all the groups in memory. If you expect large amounts of groups to be used, watch your memory consumption on the server.  We're using the "memory group by" step to avoid doing a costly sort on the data.
  • It is not possible to specify the same field twice in the same SELECT clause. (for whatever reason you might want to do that)
  • calculations and functions like string concatenation and so on is not (yet) supported.

Configuring clients

Clients typically need to following libraries to work:

  • kettle-core.jar
  • commons HTTP client
  • commons code
  • commons lang
  • commons logging
  • commons VFS (1.0)
  • log4j
  • scannotation

SQuirreL SQL

Since SQuirrel already contains most needed jar files, configuring it simply done by adding kettle-core.jar as a new driver jar file along with Apache Commons VFS 1.0 and scannotation.jar

...