Hitachi Vantara Pentaho Community Wiki

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.


Partitioning simply splits a data set into a number of sub-sets according to a rule that is applied on a row of data.  This rule can be anything you can come up with and this includes no rule at all.  However, if no rule is applied we simply call it - (round robin - ) row distribution.  You can create your own rules in the form of a partitioning method plugin.

The reason for partitioning data up is invariably linked to parallel processing since it makes it possible to execute certain tasks in parallel where this is otherwise not possible.