Hitachi Vantara Pentaho Community Wiki
Child pages
  • Database transactions in jobs and transformations
Skip to end of metadata
Go to start of metadata

Introduction

By default every job entry or step connects separately to a database.  While this is typically great for performance, stability and predictability there are times when you want to manage database transactions yourself.

Transactions in transformations

There are times when you only want to commit a changed database when a transformation ends without error or you want to perform a roll back when an error occurs somewhere.  This is possible simply by enabling the "Make the transformation database transactional" option in the "Miscellaneous" tab of the transformation settings dialog.

Transactions in jobs

Starting from version 5.0 of PDI (Enterprise Edition) you can also enable the same functionality that exists for a transformation in a job.  The "Make the job database transactional" option is located in the "Settings" tab of the job settings dialog.

Rules

The rules are simple but effective for a any transformation or job which is made database transactional:

  • if there is no parent job or parent transformation a new transaction will be created
  • if there is a parent job but it is not database transactional enabled, a new transaction will be created

or put otherwise:

  • If the parent job or transformation is transactional, no new transaction will be started.

Logging

Log tables are not in any way affected by these options.  If you want to learn more about configuring them for concurrent access, go to page:  Configuring log tables for concurrent access

Special cases & exceptions

We expect database transactions to work across sub-transformations and sub-jobs in steps like Mapping, Single Threader and Job Executor.  It will also work through sub-transformations and sub-jobs in job entries like Job or Transformation.

However, database transactions will not work in the following cases:

  • if you remotely execute a job on a different slave server (even if it's on the same machine)
  • if you use database partitioning and multiple connections per step copy need to be created.  Performance wise it doesn't make any sense in that case anyway.
  • No labels