Access Keys:
Skip to content (Access Key - 0)

Database Connections

A database connection describes the method by which Kettle connects to a database. You can create connections specific to a Job or Transformation or store them in the Kettle repository for re-use in multiple transformations or jobs.

Note: Please see also Special database issues and experiences

The following topics are covered in this section:

Creating a New Database Connection


This section describes how to create a new database connection and includes a detailed description of each connection property available in the Connection information dialog box.

To create a new connection right click the Database Connections in the tree and select New or New Connection Wizard. You can also double click Database Connections, or press F3.

The Connection information dialog box appears. The topics that follow describe the configuration options available on each tab of the Connection information dialog box.

General

The general tab is where you set up the basic information about your connection such as the connection name, type, access method, server name and log on credentials. The table below provides a detailed description of the options available under the General tab:

Feature Description
Connection Name Uniquely identifies a connection across transformations and jobs
Connection Type Type of database you are connecting to (for example, MySQL, Oracle, and so on)
Method of access This will be either Native (JDBC), ODBC, or OCI. Available access types depend on the type of database you are connecting to
Server host name Defines the host name of the server on which the database resides. You can also specify the host by IP-address
Database name Identifies the database name you want to connect to. In case of ODBC, specify the DSN name here
Port number Sets the TCP/IP port number
Username Optionally specifies the user name to connect to the database
Password Optionally specifies the password to connect to the database

Pooling

The pooling tab allows you to configure your connection to use connection pooling and define options related to connection pooling like the initial pool size, maximum pool size and connection pool parameters. The table below provides a more detailed description of the options available on the Pooling tab:

Feature Description
Use a connection pool Enables connection pooling
The initial pool size Sets the initial size of the connection pool.
The maximum pool size. Sets the maximum number of connections in the connection pool.
Parameter Table Allows you to define additional custom pool parameters.

MySQL

By default, MySQL returns complete query results in one block to the client, (Kettle in this case), so "result streaming" is enabled. One drawback associated result streaming is that it allows only one single query to be opened at any given time. You can disable this option in the MySQL tab of the database connection dialog box if necessary.

Another issue you may encounter is that the default timeout in the MySQL JDBC driver is set to 0 (no timeout). In certain instances, this may not allow Kettle to detect a server crash or sudden network failure if it occurs in the middle of a query or open database connection. This in turn leads to the infinite stalling of a transformation or job. To solve problem, set the "connectTimeout" and "socketTimeout" parameters for MySQL in the Options tab. The value to be specified is in milliseconds: for a 2 minute timeout you would specify value 120000 ( 2 x 60 x 1000 ).

You can also review other options on the linked MySQL help page by clicking on the 'Show help text on option usage' button found on the Options tab.

Note: please see also MySQL

Oracle

This tab allows you to specify the default data and index tablespaces which Kettle uses when generating SQL for Oracle tables and indexes.

This version of Pentaho Data Integration ships with the Oracle JDBC driver version 10.2.0. It is the most stable and recent driver we could find; however, if you have issues with Oracle connectivity or other problems, you may consider replacing the 10.2. JDBC driver to match your database server. Replace files "ojdbc14.jar" and "orai18n.jar" in the directory libext/JDBC of your distribution with the files found in the $ORACLE_HOME/jdbc directory on your server.

If you are using OCI and an Oracle Net8 client, the JDBC driver version used in Kettle needs to match your Oracle client version.  PDI 2.5.0 shipped with version 10.1, 3.0.0 ships with version 10.2. You can either install that version of the Oracle client or change the JDBC driver in PDI if versions don't match up.

Note: please see also Oracle

Informix

For Informix, you must specify the Informix Server name in the Informix tab in order for a connection to be usable.

SQL Server

This tab allows you configure the following properties specific to Microsoft SQL Server:

Feature Description
SQL Server instance name Sets the instance name property for the SQL Server connection.
Use .. to separate schema and table Enable when using dot notation to separate schema and table.

Other properties can be configured by adding connection parameters on the options tab of the Connection information dialog box. For example, you can enable single sign-on login by defining the domain option on the Options tab as shown below:

From the jTDS FAQ on http://jtds.sourceforge.net/faq.html:

Specifies the Windows domain to authenticate in. If present and the user name and password are provided, jTDS uses Windows (NTLM) authentication instead of the usual SQL Server authentication (i.e. the user and password provided are the domain user and password). This allows non-Windows clients to log in to servers which are only configured to accept Windows authentication.

If the domain parameter is present but no user name and password are provided, jTDS uses its native Single-Sign-On library and logs in with the logged Windows user's credentials"

Note: please see also MS SQL Server

SAP R/3

This tab allows you configure the following properties specific to SAP R/3:

Feature Description
Language Specifies the language to be used when connecting to SAP.
System Number Specifies the system number of the SAP system to which you want to connect.
SAP Client Specifies the three digit client number for the connection.

Note: For this to work, you need a plug-in, please see List of Available Pentaho Data Integration Plug-Ins

Generic

This tab is where you specify the URL and Driver class for Generic Database connections. You can also dynamically set these properties using Kettle variables. Using Kettle variables provides you with the ability to access data from multiple database types using the same transformations and jobs.

Note: Make sure to use clean ANSI SQL that works on all used database types in the latter case.

Options

This tab allows you to set database-specific options for the connection by adding parameters to the generated URL.

The use of variables is also possible within the Paramater / Value options.

Follow the instructions below to add a parameter:

  1. Select the next available row in the parameter table
  2. Choose your database type and enter a valid parameter name and its corresponding value
  3. For more database-specific configuration help, click the 'Show help text on option usage' button and a new browser tab appears in Spoon with additional information about the configuring the JDBC connection for the currently selected database type:

SQL

This tab allows you to enter a number of SQL commands immediately after connecting to the database. This is sometimes needed for reasons such as licensing, configuration, logging, tracing, and so on.

Cluster

This tab allows you to enable clustering for the database connection and create connections to the data partitions. To enable clustering for the connection, enable the 'Use Clustering?' option.

To create a new data partition, enter a partition ID and the hostname, port, database, username and password for connecting to the partition.

Advanced

This tab allows you configure the following properties for the connection:

Feature Description
Quote all identifiers in database Specifies the language to be used when connecting to SAP
Force all identifiers to lower case Specifies the system number of the SAP system to which you want to connect
Force all identifiers to upper case Specifies the three digit client number for the connection

Testing a Connection

The Test button in the Connection information dialog box allows you to test the current connection. An confirmation message displays if Spoon is able to establish a connection with the target database.

Explore

The Database Explorer allows you to browse the target database interactively, preview data, generate DDL and much more. To open the Database Explorer for an existing connection, click the 'Explore' button found on the Connection information dialog box or right-click on the connection in the Main tree and select 'Explore'. See the Database Explorer for more information.

Feature List

The feature list exposes the JDBC URL, class, and various database settings for the connection such as the list of reserved words.

Editing a Connection


To edit an existing connection, double-click on the connection name in the main tree or right-click on the connection name and select 'Edit connection.'

Duplicating a Connection


To duplicate an existing connection, right-click on the connection name and select Duplicate.

Copying to a Clipboard


This option allows you to copy the XML defining the step to the clipboard. You can then paste this step into another transformation.

Deleting a Connection


To delete an existing database connection, right-click on the connection name in the main tree and select Delete.

Executing SQL Commands on a Connection


To execute SQL command against an existing connection, right-click on the connection name and select SQL Editor. See the SQL Editor for more information.

Clearing the Database Cache Option


To speed up connections Spoon uses a database cache. When the information in the cache no longer represents the layout of the database, right-click on the connection in the Main tree and select the 'Clear DB Cache...' option. This command is commonly used when databases tables have been changed, created or deleted.

Quoting


Pentaho has implemented a database-specific quoting system that allows you to use any name or character acceptable to the supported databases' naming conventions.

Pentaho Data Integration contains a list of reserved words for most of the supported databases. To ensure that quoting behaves correctly, Pentaho has implemented a strict separation between the schema (user/owner) of a table and the tablename itself. Doing otherwise, makes it impossible to quote tables or fields with one or more periods in them correctly. Placing periods in table and field names is common practice in some ERP systems(for example, fields such as "V.A.T.")

To avoid quoting-related errors, Pentaho has added a new rule in version 2.5.0 that stops the Pentaho Data Integration from performing quoting activity when there is a start or end quote in the tablename or schema. This allows you to specify the quoting mechanism yourself.

Note: Contact Pentaho if you find other ways to improve Pentaho's quoting algorithms.

Database Usage Grid


The table below contains information that may help you configure your particular database.

Note: The list of supported databases grows continuesly in each release. Please see the list of Connection types in the database dialog.

Database Access Method Server Name or IP Address Database Name Port # (default) User Name and Password
Oracle Native Required Oracle database SID Required (1521) Required
  ODBC   ODBC DSN name   Required
  OCI   Database TNS name   Required
MySQL Native Required MySQL database name Optional (3306) Optional
  ODBC   ODBC DSN name   Optional
AS/400 Native Required AS/400 Library name Optional Required
  ODBC   ODBC DSN name   Required
MS Access ODBC   ODBC DSN name   Optional
MS SQL Server Native Required Database name Required (1433) Required
  ODBC   ODBC DSN name   Required
IBM DB2 Native Required Database name Required (50000) Required
  ODBC   ODBC DSN name   Required
PostgreSQL Native Required Database name Required (5432) Required
  ODBC   ODBC DSN name   Required
Intersystems Caché Native Required Database name Required (1972) Required
  ODBC   ODBC DSN name   Required
Sybase Native Required Database name Required(5001) Required
  ODBC   ODBC DSN name   Required
Gupta SQL Base Native Required Database Name Required (2155) Required
  ODBC   ODBC DSN name   Required
Dbase III,IV or 5.0 ODBC   ODBC DSN name   Optional
Firebird SQL Native Required Database name Required (3050) Required
  ODBC   ODBC DSN name   Required
Hypersonic Native Required Database name Required (9001) Required
MaxDB (SAP DB) Native Required Database name   Required
  ODBC   ODBC DSN name   Required
Ingres Native Required Database name   Required
  ODBC   ODBC DSN name   Required
Borland Interbase Native Required Database name Required (3050) Required
  ODBC   ODBC DSN name   Required
ExtenDB Native Required Database name Required (6453) Required
  ODBC   ODBC DSN name   Required
Teradata Native Required Database name   Required
  ODBC   ODBC DSN name   Required
Oracle RDB Native Required Database name   Required
  ODBC   ODBC DSN name   Required
H2 Native Required Database name   Required
  ODBC   ODBC DSN name   Required
Netezza Native Required Database name Required (5480) Required
  ODBC   ODBC DSN name   Required
IBM Universe Native Required Database name   Required
  ODBC   ODBC DSN name   Required
SQLite Native Required Database name   Required
  ODBC   ODBC DSN name   Required
Apache Derby Native optional Database name Optional (1527) Optional
  ODBC   ODBC DSN name   Optional
Generic Native Required Database name Required (Any) Required
  ODBC   ODBC DSN name   Optional

The generic database connection also needs to specify the URL and Driver class in the Generic tab. Pentaho also allows these fields to be specified using a variable so you can access data from multiple database types using the same transformations and jobs. Make sure to use clean ANSI SQL that works on all used database types in that case.

Configuring JNDI Connections


If you are developing transformations and jobs that will be deployed on an application server such as the Pentaho platform running on JBoss, you can configure your database connections using JNDI.

Because you don't want to have an application server running all the time during development or testing of the transformations, Pentaho has supplied a way of configuring a JNDI connection for "local" Kettle use.

To configure, edit the properties file called "simple-jndi/jdbc.properties" For example, to connect to the databases used in Pentaho Demo platform download and use this information in the properties file:

SampleData/type=javax.sql.DataSource
SampleData/driver=org.hsqldb.jdbcDriver
SampleData/url=jdbc:hsqldb:hsql://localhost/sampledata
SampleData/user=pentaho_user
SampleData/password=password
Quartz/type=javax.sql.DataSource
Quartz/driver=org.hsqldb.jdbcDriver
Quartz/url=jdbc:hsqldb:hsql://localhost/quartz
Quartz/user=pentaho_user
Quartz/password=password
Hibernate/type=javax.sql.DataSource
Hibernate/driver=org.hsqldb.jdbcDriver
Hibernate/url=jdbc:hsqldb:hsql://localhost/hibernate
Hibernate/user=hibuser
Hibernate/password=password
Shark/type=javax.sql.DataSource
Shark/driver=org.hsqldb.jdbcDriver
Shark/url=jdbc:hsqldb:hsql://localhost/shark
Shark/user=sa
Shark/password=

Note: It is important that the information stored in this file in the simple-jndi directory mirrors the content of your application server data sources.

Unsupported Databases


Contact Pentaho if you want to access a database type that is not yet supported. A few database types are not supported in this release due to the lack of a sample database and/or software.

It is generally possible to read from unsupported databases by using the Generic database driver through an ODBC or JDBC connection.


  1. Aug 26, 2013

    Michael Patterson says:

    All, I'm looking to implementing clustering for a 2 node database setup. I've e...

    All,

    I'm looking to implementing clustering for a 2 node database setup. I've enabled the clustering option as outlines above, entering the relevant details per server. We've taken down node1 to test our settings, but job still fails because the connection in the kettle.properties file points to  node1.

    Is there a way to add multiple hostnames in kettle.properties, so that if one node is down our jobs do not fail?

This documentation is maintained by the Pentaho community, and members are encouraged to create new pages in the appropriate spaces, or edit existing pages that need to be corrected or updated.

Please do not leave comments on Wiki pages asking for help. They will be deleted. Use the forums instead.

Adaptavist Theme Builder (4.2.0) Powered by Atlassian Confluence 3.3.3, the Enterprise Wiki