Hitachi Vantara Pentaho Community Wiki
Skip to end of metadata
Go to start of metadata

This space is intended to list available Plug-Ins for Pentaho Data Integration and to keep the internal ID unique. 

Note: We are in transition of these plugins to the Marketplace, please see the details over there.

Notes

  • If you want your Plug-In listed here, please contact communityconnection@pentaho.org. Include the information for the table below, the Kettle version (2, 3, 4 etc) and your wiki id.  You have the option to have the link point to a web page that you host or to have us create a wiki page that you maintain. If you need also a space in our subversion plugin space, we can add a project to svn://source.pentaho.org/svnkettleroot/plugins for you.
  • Unless stated otherwise, the plug-ins listed on this page are free to download and use, but are not officially supported as part of a paid Pentaho Subscription.
  • Plugins for version 2 are listed here.

 Plugins - Spoon perspectives

 Plugins - Partitioner

Name

Short Description / Remarks

PDI
Version / Download

Quotient of Division

Use the quotient of the division (over the number of partitions) rather than the remainder, see Bucket Partitioner plugin

4.x (src)

Plugins - Database types

Name

Short Description / Remarks 

PDI 
Version / Download

NuoDB 

Adds support for the NuoDB database 

5.x (src)

Plugins - Job Entries

Unique ID

Name

Short Description / Remarks

PDI
Version / Download

DummyJob

Dummy Job Entry

Dummy plugin test job entry - this can be a blueprint for other job entries

3.x

- General


 

 

JabberMessage

JabberJob

Allows sending of Jabber messages (like to Google Talk)

3.x (bin) / (src)

MondrianOutput

Mondrian Output

Output the result of a mondrian MDX query to an excel file or chart. Jason Chu

3.x (info) / (bin) / (src)

- KFF

 

 

 

KjubeBatchId

KFF Batch ID manager

Automatically retrieves, sets and updates a batch ID. Part of the KFF project

3.2, 4.x

KjubeConfigurator

KFF Configurator

Automatically configures the KFF environment by using a few base parameters. Part of the KFF project

3.2, 4.x



 Plugins - Transformation Steps

Unique ID

Name

Short Description / Remarks

PDI
Version / Download

DummyPlugin

Dummy Plugin

Dummy plugin test step - this can be a blueprint for other steps

3.x

- Input Steps

 

 

 

DateGenerator

Date Generator

Generates a sequence of consecutive dates.

3.0, 3.1

IMS Database Loader

Proventa IMS Database Loader

Reading (large) IBM IMS Database dumps fast and easy.
This is a commercial (closed source) plugin by Proventa AG.

3.x

ITN Connector ERP

ITN Connector ERP

ITN Connector ERP provides functionality to extract data from SAP tables. Data may be extracted: all / selected fields / selected rows. This is an open source plugin by it-novum.

3.x, 4.x (doc)

KafkaConsumer

Apache Kafka Consumer

Reads binary messages from Apache Kafka message queue.

3.x/4.x/5.1

ProERPCONN

godesys SAP®-Connector

The godesys SAP®-Connector offers a comfortable and quick access to the entire SAP dataset.
This is a commercial (closed source) plugin by godesys AG (formerly Proratio)

4.x

SalesforceInputPlugin

SalesforceInputPlugin

Allows you to read data from Salesforce. (Translation : US & FR)
Updated : 10 Dec. 2008

3.1

ShapeFileReader

ESRI Shapefile Reader

Reads shape file data from an ESRI shape file and linked DBF file

3.x

SugarCRMModule

SugarCRM

The SugarCRM Plugin allows you to access all data available in SugarCRM modules.

3.x

SuperCsvInput

Super CSV Input

Read CSV input files.

3.1

TypeExitGoogleAnalyticsInputStep

Google Analytics

Reads data from Google Analytics using Google's Export API. Plugin created by type-exit.org.
Note: This step has become a standard step in 4.2 and Pentaho Support is available.

3.2, 4.0

TypeExitEdi2XmlStep

EDI to XML

Converts EDI text to generic XML. This will be included and supported in PDI 4.3 by PDI-7019.

4.x (doc)

com.legstar.pdi.zosfile

z/OS File Input

Reads IBM mainframe files with records described by COBOL copybooks. Project Page

4.x

- Input/Output Steps


 

 

PaloDimInput / PaloDimOutput / PaloCellInput / PaloCellOutput

PaloKettlePlugin

Use Palo Molap Database on your ETLs designed with Kettle.

3.x

Rss Input/Rss Output

Rss Input/Output

One plugin to read RSS feeds and another one to write feeds to file.
Note: This step has become a standard step in 3.2 and Pentaho Support is available.

3.x

- Output Steps


 

 

ArffOutput

ARFF Output

Saves data to a file in WEKA's ARFF (Attribute Relation File Format).
Note: Pentaho Support is available for this plug in as part of a Pentaho Subscription.

3.x4.x

KafkaProducer

Apache Kafka Producer

Sends binary messages over Apache Kafka message queue.

3.x/4.x/5.1

MQTTProducer

MQTT Producer

Sends binary messages over MQTT message queue.

3.x/4.x/5.1

TypeExitExcelWriterStep

Excel Writer

Supports template based writing and modification of xls and xlsx files.
Note: This step has become a standard step in 4.2 and Pentaho Support is available.

3.2, 4.0

SuperCsvOutput

Super CSV Output

Output to a CSV file.

3.1

cmisput

Put document via CMIS

Outputs documents to CMIS compliant repositories like Alfresco.

4.x

- Transform Steps


 

 

ANTLR Recognizer

ANTLR Recognizer

Uses an ANTLR grammar file (*.g) to validate an input field, outputs a boolean field whether the input is recognized by the grammar.

4.x

Asciify

Asciify

Asciifies strings. It will replace characters that are outside of the 7-bit ASCII range (e.g. diacritics) with a close equivalent and it will replace or remove any other character it can't convert.

3.0, 3.1

DataGrid

Data Grid

This step allows you to enter rows of data directly into a step. Part of the KFF project. This step was contributed to Kettle and is now part of version 4.x.

3.2

DateTime

Date-time calculator

Assemble a date using 7 fields containing centuries, years, months, days, hours, minutes and seconds.  This date format is typically used in AS/400 RPG programs. Part of the KFF project.

3.2, 4.x

Decoder

KFF Decoder

Decode AS/400 encoded data using mask information and table-name mappings. Part of the KFF project.

3.2, 4.x

EncryptDecryptPlugin

Encrypt/Decrypt

Provides the ability to encrypt and decrypt the values of any field of a table/fixed file/CSV file. Plug-In posted by Persistent Systems Ltd.

3.x

FieldCalculatorPlugin

Field Calculator

This plug-in does the arithmatic operatios on the columns of a table. The derived values can be then store in a table/fixed file/CSV file. Plug-In posted by Persistent Systems Ltd.

3.x

Formula

Formula

Calculates values and evaluates expressions.
This is based on the Open Office standard for expressions.
Note: This step has become a standard step in 3.2 and Pentaho Support is available.

3.x

Head

Head

Read first x rows of stream.

3.1

ProtobufDecode

Protocol Buffers Decoder

Decodes binary messages encoded by Google Protocol Buffers

3.x/4.x

PRScript

Execute R Script

Execute R script from file by specifying input and output variables

4.x (source)

Rejects

Rejects

The Rejects steps handles rejected records in a standard fashion. Part of the KFF project.

3.2, 4.x

ReservoirSampling

Reservoir Sampling

Samples a fixed number of rows (with uniform probability) from the incoming stream.
Note: This step has become a standard step in 3.2 and Pentaho Support is available.

3.x

TypeExitRubyStep

Ruby Scripting

Allows to use the Ruby language in Kettle Transformations. Similar to the JavaScript step, but geared towards the Ruby language.

4.x

SeasonId

Season ID calculator

Assemble a unique (bi-yearly fashion industry) season ID using 3 fields containing centuries, years and a season number. Part of the KFF project.

3.2, 4.x

TableCompare

Table Compare

Compares 2 tables and gives back a detailed list of differences. Part of the KFF project.

3.2, 4.x

Tail

Tail

Read after first x rows of stream.

3.1

TrimStrings

Trim Strings

Trims string fields.  You can trim all fields, select fields or exclude fields.  Supports left, right and full trimming. Part of the KFF project.

3.2, 4.x

TrimCut

TrimCut (Experimental)

Trims and cuts string values to size.

3.0 3.1

UnivariateStats

Univariate Stats

Computes simple univariate statistics.  Available statistics include: N, minimum, maximum, mean, sample standard deviation, median and arbitrary percentiles (computed using a simple mid-point method or interpolation).
Note: This step has become a standard step in 3.2 and Pentaho Support is available.

3.x

WekaScoring (Weka 3.6 or 3.7.0)

Weka Scoring

Appends predictions (labels or probability distributions) from a pre-built WEKA model (classifier, clusterer or PMML). Compatible with Weka 3.6.x and 3.7.0.
Note: Pentaho Support is available for this plug in as part of a Pentaho Subscription.

3.x, 4.x

WekaScoring (Weka 3.7.2 - 3.7.4)

Weka Scoring

Appends predictions (labels or probability distributions) from a pre-built WEKA model (classifier, clusterer or PMML). Compatible with Weka 3.7.2 - 3.7.4.

Note: Pentaho Support is available for this plug in as part of a Pentaho Subscription.

3.x, 4.x

WekaScoring (Weka >= 3.7.5)

Weka Scoring

Appends predictions (labels or probability distributions) from a pre-built WEKA model (classifier, clusterer or PMML). Compatible with Weka >= 3.7.5.

Note: Pentaho Support is available for this plug in as part of a Pentaho Subscription.

4.x

- Join Steps

 

 

 

kettle-history-join-plugin

History Join

This plugin supply a method to join two tables using the date-from and date-to history. It use the two dates that indicate the life of the record and join using a query (like the database join plugin) to resolve the record's story of the two entities.

4.x

- Data Warehouse Steps

 

 

 

kettle-date-dimension-plugin

Date dimension

This plugin provides a function to resolve, and insert if it doesn't exist, the date dimension. It calculates all calendar data and you must supply the table info used to save the information.

4.x

- Lookup Steps

 

 

 

MDCheckPlugin

Melissa Data Contact Verify

Cleanse and validate Name, Address, Phone, Email with Melissa Data on-premise or cloud

4.x/5.x/6.x

MDCheckPlugin

Melissa Data IP Locator

Get latitude, longitude, city, state, ISP and more from and IP address on-premise or cloud

4.x/5.x/6.x

MDCheckPlugin

Melissa Data MatchUp

Match and consolidate contact records using fuzzy algorithms and domain specific knowledge

4.x/5.x/6.x

MDCheckPlugin

Melissa Data SmartMover

Process names/addresses against the US NCOA database and Canadian COA database

4.x/5.x/6.x

MDGlobalVerifyPlugin

Melissa Data Global Verify

Cleanse and validate GLOBAL Name, Address, Phone, Email data (with mailbox validation)

4.x/5.x/6.x

MDPersonatorPlugin

Melissa Data Personator

Verify contact data, and append missing phones, companies, emails along with demographic information

4.x/5.x/6.x

MDPropertyPlugin

Melissa Data Property

Comprehensive US property and mortgage data, with 165 information fields available. Updated Weekly

4.x/5.x/6.x

MDBusinessCoderPlugin

Melissa Data Business Coder

Comprehensive US Business Data cleansing and Firmagraphic appends

4.x/5.x/6.x

MDProfilerPlugin

Melissa Data Profiler

Advanced data profiling tool that provides statistical analysis and assessment of your data quality needs for consistency, uniqueness and correctness

4.x/5.x/6.x

MDGeneralCleansingPlugin

Melissa Data General Cleansing

General Data cleanser that can cleanse based on rules, regexes and pre-defined knowledge bases.

4.x/5.x/6.x

MaxMindGeoIPLookup

MaxMind GeoIP Lookup

Lookup country, city, lat, long and much more from an IP address using the MaxMind Geo IP databases

4.x

Advanced HTTP Plugin

Advanced HTTP

Advanced HTTP plugin, with lot of new features : get and post, fail on error, ssl, basic auth, timeout ... Code is here.

3.x

 Bulk Loading Steps

 

 

 

TeraFastPlugin

Teradata bulk loader

Maximum performance for large data loads into Teradata by Aschauer EDV.   This step is now included in Kettle 4.x.

3.x


Here are the attachments to the plugin development page, including the sample dummy step and job entry plugins for versions 2.5.x and 3.0.x.

3 Comments

  1. I've created a new PDI transformation step plugin which allows a user to query an RDF big data repository with SPARQL. 

    The LGPL project is available at https://code.google.com/p/kettle-openrdf-plugin/
    I would like to get it listed on the wiki but have had no response emailing communityconnection@pentaho.org. What is the procedure to list this plugin?

  2. Greetings ...

    I'm Paulo Cordeiro de Almeida Filho... undergraduate students in Information Systems at the Federal Rural University of Pernambuco - Academic Unit of Sierra Hewn, and developed a plugin for PDI. Therefore, would like information on how I can offer this plugin to become available for users interested, since I got no response via email.

    The plugin is available in <http://www.4shared.com/rar/NwUG8MBIce/DimensionDateGenerator.html>

  3. Hi Paulo,

    Thanks for creating a plugin!  This page provides information about submitting a plugin to the marketplace: http://wiki.pentaho.com/display/EAI/Marketplace.   

    Also note that this page is not monitored regularly.  So, if you have other questions about plugins, the marketplace, or anything else, visit the forums: http://forums.pentaho.com/forum.php.  

    C.