PDI Partition Method Plugin Development
Creating a partition method plugin is as easy as creating 2 classes and then putting them in a jar file. Here are the 2 classes you need:
- A class that implements the Partitioner interface. You can extend BasePartitioner for your convenience.
- A class that allows the user to configure the partitioner options in a dialog.
The Partitioner interface is fairly simple. Beyond a few administrative methods (see the example below) there is only the getPartition() method that's really important: getPartition(). This method too is simple in the sense that it gets a row of data as input and you have to give back an integer x where x>=0 and x<nrPartitions. nrPartitions is initialized using the init() method.
For more information, see the sample below.
The dialog class
The dialog class only need two methods to be declared:
- open() This method is called to open the dialog shell and show the dialog to the user
- setRepository() This method is called by Kettle to pass the repository to the dialog so that additional repository objects can be referenced from within the dialog. (database connections, partitioning schema and so on)
For more information on how to program a dialog, see elsewhere in the PDI SDK pages or on the Internet. Look for Eclipse platform SWT code snippets.
You should annotate your partitioner class with the @PartitionerPlugin annotation to signal to the Kettle plugin registry that this plugin needs to be loaded at startup.
Then compile the 2 classes and put them in a jar file. Place that jar file in the plugins/steps (steps is not a typo!) folder or sub-folder
If you need additional libraries to be included in the class path of the plugin you can place them in a lib sub-folder next to the plugin jar file.
See the source code of the Hour partitioner plugin:
The Partitioner class: HourPartitioner.java
The Dialog class: HourPartitionerDialog.java
The hour partitioner takes the name of a file to partition on. The name of the file contains the hour on which the data was captured and we want to use this to partition on. For example: Weblogs-20100329-23.txt This partitioner takes the 23 from the filename, turns it into an integer and calculates the remainder of the division by the number of partitions in the partitioning schema.