It is recommended to use the official Microsoft JDBC Driver for SQL Server (v4.0) to connect PDI to a Microsoft SQL Server.
For Integrated Security authentication, the JDBC driver requires the bundled "DLL" file to also be added. The DLL file can be added to the "libswt/win64/" directory of the data-integration client.
Open-Source MS SQL Driver (jTDS)
The jTDS driver is used to connect to MS SQL Server. Most questions can be solved by the jTDS FAQ:
Windows Single Sign On (SSO) Authentication using NTLM
Just leave the username and password blank and jTDS uses the ntlmauth.dll that is placed in the distributions \libswt\win32 folder and logs on with the logged in user credentials.
For further information search for NTLM in the jTDS FAQ. E.g. you can also specify the domain option:
"Specifies the Windows domain to authenticate in. If present and the user name and password are provided, jTDS uses Windows (NTLM) authentication instead of the usual SQL Server authentication (i.e. the user and password provided are the domain user and password). This allows non-Windows clients to log in to servers which are only configured to accept Windows authentication.
If the domain parameter is present but no user name and password are provided, jTDS uses its native Single-Sign-On library and logs in with the logged Windows user's credentials."
Slow processing and locking situations
Especially when using multiple connections e.g. by different in/output steps to the same database we often got reports about slow processing and even locking situations.
User reported performance improved significantly by disabling the connection pooling. (tested with MS SQL Server 2000)
Another issue is found together with the Update step, it has two statements: one for looking up the data and one for the update. This can lead to deadlock situations even when we use one (since version 3.x) or two connections.
User experienced deadlocks after creating indexes, therefor I suspect the creation / updating of the index locks the table for a certain time.
I looked into the MS SQL documentation and found the reasons for this:
Fill factor: http://technet.microsoft.com/en-us/library/ms177459.aspx
"Page Splits and Performance Considerations
When a new row is added to a full index page, the Database Engine moves approximately half the rows to a new page to make room for the new row. This reorganization is known as a page split. A page split makes room for new records, but can take time to perform and is a resource intensive operation. Also, it can cause fragmentation that causes increased I/O operations. A correctly chosen fill factor value can reduce the potential for page splits by providing enough space for index expansion as data is added to the underlying table."
Also see "Creating indexes" on http://technet.microsoft.com/en-us/library/ms190197.aspx:
"Performance considerations: [...] Creating the index offline or online.
When an index is created offline (the default), exclusive locks are held on the underlying table until the transaction creating the index has completed. The table is inaccessible to users while the index is being created."
--> Due to these two informations I propose to try to use an online index first and/or change the fill factor.
The syntax is described over here: http://technet.microsoft.com/en-us/library/ms188783.aspx
Options for creating an index are:
We look forward to your experiences and comments.
BLOBS in MS SQL Server
Here is a nice article "Storing Images and BLOB files in SQL Server":
Note: The CREATE statement includes UNKNOWN instead of IMAGE [backwards compatible but should be avoided since SQL 2005] or VARBINARY [since SQL 2005]. Please see feature request PDI-2969.