Hitachi Vantara Pentaho Community Wiki
Skip to end of metadata
Go to start of metadata

General information

JDBC Driver

It is recommended to use the official Microsoft JDBC Driver for SQL Server (v4.0) to connect PDI to a Microsoft SQL Server.

For Integrated Security authentication, the JDBC driver requires the bundled "DLL" file to also be added.  The DLL file can be added to the "libswt/win64/" directory of the data-integration client.

Open-Source MS SQL Driver (jTDS)

The jTDS driver is used to connect to MS SQL Server. Most questions can be solved by the jTDS FAQ:

Windows Single Sign On (SSO) Authentication using NTLM

Just leave the username and password blank and jTDS uses the ntlmauth.dll that is placed in the distributions \libswt\win32 folder and logs on with the logged in user credentials.

For further information search for NTLM in the jTDS FAQ. E.g. you can also specify the domain option:

"Specifies the Windows domain to authenticate in. If present and the user name and password are provided, jTDS uses Windows (NTLM) authentication instead of the usual SQL Server authentication (i.e. the user and password provided are the domain user and password). This allows non-Windows clients to log in to servers which are only configured to accept Windows authentication.

If the domain parameter is present but no user name and password are provided, jTDS uses its native Single-Sign-On library and logs in with the logged Windows user's credentials."

Slow processing and locking situations

Especially when using multiple connections e.g. by different in/output steps to the same database we often got reports about slow processing and even locking situations.

User reported performance improved significantly by disabling the connection pooling. (tested with MS SQL Server 2000)

Another issue is found together with the Update step, it has two statements: one for looking up the data and one for the update. This can lead to deadlock situations even when we use one (since version 3.x) or two connections.

User experienced deadlocks after creating indexes, therefor I suspect the creation / updating of the index locks the table for a certain time.

I looked into the MS SQL documentation and found the reasons for this:
Fill factor:

"Page Splits and Performance Considerations

When a new row is added to a full index page, the Database Engine moves approximately half the rows to a new page to make room for the new row. This reorganization is known as a page split. A page split makes room for new records, but can take time to perform and is a resource intensive operation. Also, it can cause fragmentation that causes increased I/O operations. A correctly chosen fill factor value can reduce the potential for page splits by providing enough space for index expansion as data is added to the underlying table."

Also see "Creating indexes" on
"Performance considerations: [...] Creating the index offline or online.
When an index is created offline (the default), exclusive locks are held on the underlying table until the transaction creating the index has completed. The table is inaccessible to users while the index is being created."

--> Due to these two informations I propose to try to use an online index first and/or change the fill factor.

The syntax is described over here:

Options for creating an index are:

| FILLFACTOR = fillfactor
| ONLINE = { ON | OFF }
| MAXDOP = max_degree_of_parallelism

We look forward to your experiences and comments.

BLOBS in MS SQL Server

Here is a nice article "Storing Images and BLOB files in SQL Server":

Note: The CREATE statement includes UNKNOWN instead of IMAGE [backwards compatible but should be avoided since SQL 2005] or VARBINARY [since SQL 2005]. Please see feature request PDI-2969.

  • No labels

1 Comment

  1. user-36aa8

    For initial load of the datawarehouse, I use output table step running with multiple copies.

    I tune the best performance balancing between #of copies and commit rows.

    When our server crashed and I needed to recover to a new server, I found out several things...

    1. The tuning between #of copies and commit rows depend on the hardware setup.

    2. #of copies above 5 hits a snag in jTDS and worse with MS JDBC driver, I get connection reset by peer eventhough the database connection is local.

    After searching the net, testing, I found the problem is in jTDS where opening nultiple connection simultaneously has a limit of 5.  Above that you are guaranteed to hit a connection reset error.

    I have tried the trial version of DataDirect JDBC sql server driver and it has been great.

    I could get a maximum of 24,000 rows per sec insertions to a table in SQL server 2000 with 80 copies of output step with 5 commit rows.

    In jTDS, the number for inserts on a single copy of output step is about 600 rows per sec.

    I will be testing insert update problem as I have this issue with jTDS also.

    Not all JDBC drivers are created equal.

    Herman Darmawan