Hitachi Vantara Pentaho Community Wiki
Child pages
  • My transformation is running slow, what do I do?!

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

First, dont panic.

Second, is your expectation of 'slow', or performance, based on experience and is reasonable.  For instance, remember basic math:

 *Extremely simple read/write on the SAME physical box to a different database instance that is using (gasp, old!) IDE drives:  5000 r/s is reasonable (10 million records will still take 30 min).  If you are using laptop drives...I'm sorry for you.

 *Going across a network, your network latency and performance also matters. If your database is running on aging hard drives/not configured optimally, it only takes one - the source or the destination database - to slow things down.  2500 r/s  (10 million records will take over an hour)

*Doing an overly complex process that involves sorts, flattens, contant lookups that aren't cached, etc:  150 r/s (10 million records will take  18 hours)

Third, do not blame anyone or anything or any technology until you get the facts.


 Go back and read the above.

Now, here are some quick pointers to find out where the bottle neck is.  They are not necessarily in any order, and may or may not be a good way to identify your particular issue/bottleneck:

*FLOW OF DATA - Keep in mind that ETL is a FLOW of data.  Just like a river.  If there is something that constrains the flow, even if it's only one step, it will impact the entire flow of data.

*Run the transformation in SPOON, watching the records/second to the right of the steps, and the input/output buffers.  That is a GREAT way to quickly assess where an issue may be happening.

*Replace one of your database table input/ouput steps with TEXT FILE input/output. Check each input/output one at a time - if there is a significant difference, you -may- have found -one- of the bottlenecks and can address it. Note that lookup/sql/similar steps will take a little more work to check those.

 *Limit your incoming source(s) data for this next one, and possibly increase the memory for your Spoon IDE (or on a kitchen/carte instance and look at the logs afterwards) - put 'BLOCKING STEP' before and after any step that is in question, and see it's independent records/sec.  If it seems low, that may be the problem step.