Data Services via the Thin Kettle JDBC driver
The thin Thin Kettle JDBC driver allows a Java Driver provides a means for a Java-based client to query the results of a Kettle transformation remotely using JDBC and SQL.
As with most JDBC drivers, there is a server and a client component to the JDBC driver.
The server is designed to run as a Servlet on the Carte server, the Pentaho Data Integration server or Pentaho Business Analytics platform. At the time of writing only Carte is supported.
The client JDBC driver consists of the kettle-core.jar library which has dependencies against Apache Commons HTTP Client and Apache Commons VFS only.
The carte configuration file accepts a <services> block that can contain <service> elements with the following sub-elements:
- name : The name of the service. Only alphanumeric characters are supported at the moment, no spaces.
- filename: The filename of the service transformation (.ktr) that will provide the data for the service
- service_step: the name of the step which will provide data during querying.
The JDBC Client
The JDBC driver uses the following class:
The URL is in the following format:
The following standard options are available:
public static final String ARG_WEBAPPNAME="webappname";
public static final String ARG_PROXYHOSTNAME = "proxyhostname";
public static final String ARG_PROXYPORT = "proxyport";
public static final String ARG_NONPROXYHOSTS = "nonproxyhosts";
- webappname : the name of the web app (future feature to support running on the DI server)
- proxyhostname : the proxy server for the HTTP connection(s)
- proxyport : the port of the proxy server
- nonproxyhosts : the hosts (comma seperator) for which not to use a proxy
Parameters for the service transformation can be set with the following format: PARAMETER_name=value (so with the option name prepended with "PARAMETER_")
Support for the SQL is minimal at the moment.
The following things are supported:
- COUNT(DISTINCT field)
- IIF( condition, true-value or field, false-value or field)
- Aggregates: SUM, AVG, MIN, MAX
- Alias both with the "AS" keyword and with one or more spaces seperated, for example SUM(sales) AS "Total Sales" or SUM(sales) TotalSales
- Strictly one service name
- nested brackets
- AND, OR, NOT if preceded by brackets, for example: NOT ( A = 5 OR C = 3 )
- precedence taken into account
- Literals (String, Integer)
- PARAMETER('parameter-name')='value' (always evaluates to TRUE in the condition)
- GROUP BY
- Group on fields, not IIF() function
- Conditions should be placed on the aggregate construct, not the alias
- ORDER BY
- You can order on any column in the result. (to be fixed later to allow you to also sort on non-selected columns in the service)
Any Java-based, JDBC-compliant tool, including third-party reporting systems, can use this driver to query a Kettle
transformation using a SQL string via JDBC.
Just in time blending of data from multiple sources for a complete picture:
- Connect, combine and transform data from multiple sources
- Query data directly from any transformation
- Access architected blends with the full spectrum of Pentaho Analytics
- Manage governance and security of data for on-going accuracy
This just in time, architected blending delivers accurate big data analytics based on the blended data. You can connect to, combine, and even transform data from any of the multiple data stores in your hybrid data ecosystem into these blended views, then query the data directly via that view using the full spectrum of analytics in the Pentaho Analytics platform, including predictive analytics.
Available since Pentaho Data Integration Version 5.0 GA (Enterprise Edition)
Please see the following pages for more information: