Hitachi Vantara Pentaho Community Wiki
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 23 Next »

Kettle JDBC driver

The thin Kettle JDBC driver allows a Java client to query a Kettle transformation remotely using JDBC and SQL.

Architecture

As with most JDBC drivers, there is a server and a client component to the JDBC driver.
The server is designed to run as a Servlet on the Carte server, the Pentaho Data Integration server or Pentaho Business Analytics platform.  At the time of writing only Carte is supported.

Server configuration

The carte configuration file accepts a <services> block that can contain <service> elements with the following sub-elements:

  • name : The name of the service.  Only alphanumeric characters are supported at the moment, no spaces.
  • filename: The filename of the service transformation (.ktr) that will provide the data for the service
  • service_step: the name of the step which will provide data during querying.

For example:

<services>
  <service>
    <name>Service</name> 
    <filename>/path/to/your/service-transformation.ktr</filename> 
    <service_step>Output</service_step>
   </service>
 </services>

Monitoring

During execution of a query, 2 transformations will be executed on the server:

  1. A service transformation, of human design built in Spoon to provide the service data
  2. An automatically generated transformation to aggregate, sort and filter the data according to the SQL query

These 2 transformations will be visible on Carte or in Spoon in the slave server monitor and can be tracked, sniff tested, paused and stopped just like any other transformation.  However, it will not be possible to restart them manually since both transformations are programatically linked.

The JDBC Client

The JDBC driver uses the following class:

org.pentaho.di.core.jdbc.ThinDriver

The URL is in the following format:

jdbc:pdi://hostname:port/kettle?option=value&option=value

The following standard options are available:

  • webappname : the name of the web app (future feature to support running on the DI server)
  • proxyhostname : the proxy server for the HTTP connection(s)
  • proxyport : the port of the proxy server
  • nonproxyhosts : the hosts (comma seperator) for which not to use a proxy
  • debugtrans : the optional name of a file in which the generated transformation will be stored for debugging purposes (example: /tmp/debug.ktr)

Parameters for the service transformation can be set with the following format:  PARAMETER_name=value (so with the option name prepended with "PARAMETER_")

SQL Support

Support for the SQL is minimal at the moment.

The following things are supported, please consider everything else unsupported:

  • SELECT:
    • * is expanded to include all rows
    • COUNT(field)
    • COUNT(*)
    • COUNT(DISTINCT field)
    • DISTINCT <fields>
    • IIF( condition, true-value or field, false-value or field)
    • Aggregates: SUM, AVG, MIN, MAX
    • Alias both with the "AS" keyword and with one or more spaces seperated, for example SUM(sales) AS "Total Sales" or SUM(sales) TotalSales
    • Constant expressions are possible, see below in the literals section.
    • Calculations on the other hand are not possible yet, perform them in the service transformation for now.
  • FROM
    • Strictly one service name, aliasing is possible
    • You can omit the service name to query from an empty row or you can query from dual, for example "SELECT 1"  or "SELECT 1 FROM dual" are the same.
  • WHERE
    • nested brackets
    • AND, OR, NOT if preceded by brackets, for example: NOT ( A = 5 OR C = 3 )
    • precedence taken into account
    • Literals (String, Integer)
    • PARAMETER('parameter-name')='value'  (always evaluates to TRUE in the condition)
    • =
    • <
    • >
    • <=, =<
    • >=, =>
    • <>
    • LIKE (standard % and ? wildcards are converted to .* and . regular expressions)
    • REGEX (matches regular expression)
    • IS NULL
    • IS NOT NULL
    • IN ( value, value, value )
    • You can put a condition on the IIF expression or it's alias if one is used. (please use identical string literals for expressions)
  • GROUP BY
    • Group on fields, not IIF() function
  • HAVING
    • Conditions should be placed on the aggregate construct, not the alias
    • Please use identical strings for the expressions, the algorithm is not yet that smart.  In other words, if you use "COUNT( * )" in the SELECT clause you should use the same "COUNT( * ) " expression in the HAVING clause, not "COUNT(*)" or any variant of it.
  • ORDER BY
    • You can order on any column in the result. (to be fixed later to allow you to also sort on non-selected columns in the service)

Literals: 

  • Strings have single quotes around them, escaping quotes is not yet supported.
  • Dates have square brackets around them and the following formats are supported: [yyyy/MM/dd HH:mm:ss.SSS], [yyyy/MM/dd HH:mm:ss] and [yyyy/MM/dd]

  • Number and BigNumber should have no grouping symbol and the decimal is . (example 123.45)
  • Integers contain only digits
  • Boolean values can be TRUE or FALSE

Limitations

Besides the obviously plentiful limitations in the support for the SQL standard, there are a few noteworthy things to note:

  • Grouping is done using a "Memory Group By" step which keeps all the groups in memory. If you expect large amounts of groups to be used, watch your memory consumption on the server.  We're using the "memory group by" step to avoid doing a costly sort on the data.
  • Prepared statements are not yet supported which will limit the usefulness of the driver in 3rd party tools.
  • It is not possible to specify the same field twice in the same SELECT clause. (for whatever reason you might do that)

Configuring clients

SQuirreL SQL

Since SQuirrel already contains most needed jar files, configuring it simply done by adding kettle-core.jar as a new driver jar file

BIRT

The following jar files need to be added:

  • kettle-core.jar
  • commons HTTP client
  • commons code
  • commons lang
  • commons logging
  • commons VFS (1.1)
  • log4j

Pentaho Report Designer

Simply replace the kettle-*.jar files in the lib/ folder with new files from Kettle v5.0-M1 or higher.

Pentaho Schema Workbench

Replace the current kettle-*.jar files with the ones from Kettle v5 or later.

Kettle

You can query a remote service transformation with any Kettle v5 or higher client.  You can query the service through the database explorer and the various database steps (for example the Table Input step).

DataCleaner

TODO: ask project owners to change the current old driver class to the new thin one.

Jaspersoft iReport Designer

Partial success as I'm getting some XML parsing errors.  However, adding the aforementioned jar files at least allow you to get back query fields:

To be investigated.

Future

The following things are next on the agenda:

  • Caching of services data in memory with validity time-out
  • Caching of queries in memory with validity time-out
  • writing to a service transformation (INSERT INTO)
  • No labels