Hitachi Vantara Pentaho Community Wiki
Child pages
  • Pentaho Data Integration Steps
248 more child pages
Skip to end of metadata
Go to start of metadata

Introduction

This page contains the index for the documentation on all the standard steps in Pentaho Data Integration.
We invite everyone to add more details, tips and samples to the step pages.

Name

Category

ID

Description

Metadata Java class
opdts = org.pentaho.di.trans.steps

Abort

Flow

Abort

Abort a transformation

opdts.abort.AbortMeta

Add a checksum

Transform

CheckSum

Add a checksum column for each input row

opdts.checksum.CheckSumMeta

Add constants

Transform

Constant

Add one or more constants to the input rows

opdts.constant.ConstantMeta

Add sequence

Transform

Sequence

Get the next value from an sequence

opdts.addsequence.AddSequenceMeta

Add value fields changing sequence

Transform

FieldsChangeSequence

Add sequence depending of fields value change. Each time value of at least one field change, PDI will reset sequence.

opdts.fieldschangesequence.FieldsChangeSequenceMeta

Add XML

Transform

AddXML

Encode several fields into an XML fragment

opdts.addxml.AddXMLMeta

Aggregate Rows

Deprecated

 

 

 

Analytic Query

Statistics

AnalyticQuery

Execute analytic queries over a sorted dataset (LEAD/LAG/FIRST/LAST)

opdts.analyticquery.AnalyticQueryMeta

Append streams

Flow

Append

Append 2 streams in an ordered way

opdts.append.AppendMeta

Arff Output

Data Mining

Arff Output

Writes data in ARFF format to a file

opdts.append.arff.ArffOutputMeta

Automatic Documentation Output

Output

AutoDoc

This step automatically generates documentation based on input in the form of a list of transformations and jobs

opdts.autodoc.AutoDocMeta

Avro input

Input

AvroInput

Decode binary or Json Avro data from a file or a field

opdts.avroinput.AvroInputMeta

Avro Input (New)

Input

AvroInputNew

Decode binary or Json Avro data from a file or a field

opdts.avroinput.AvroInputMeta

Avro Output

Output

AvroOutput

Encode binary or Json Avro data to a file

opdts.avrooutput.AvroOutputMeta

 


Block this step until steps finish

Flow

BlockUntilStepsFinish

Block this step until selected steps finish.

opdts.blockuntilstepsfinish.BlockUntilStepsFinishMeta

Blocking Step

Flow

BlockingStep

This step blocks until all incoming rows have been processed. Subsequent steps only recieve the last input row to this step.

opdts.blockingstep.BlockingStepMeta

Calculator

Transform

Calculator

Create new fields by performing simple calculations

opdts.calculator.CalculatorMeta

Call DB Procedure

Lookup

DBProc

Get back information by calling a database procedure.

opdts.dbproc.DBProcMeta

Call Endpoint

BA Server

CallEndpointStep

Calls API endpoints from the BA server within a PDI transformation.

org.pentaho.di.baserver.utils.CallEndpointMeta

Change file encoding

Utility

ChangeFileEncoding

Change file encoding and create a new file

opdts.changefileencoding.ChangeFileEncodingMeta

Cassandra input

Big Data

CassandraInput

Read from a Cassandra column family

opdts.cassandrainput.CassandraInputMeta

Cassandra output

Big Data

CassandraOutput

Write to a Cassandra column family

opdts.cassandraoutput.CassandraOutputMeta

Check if a column exists

Lookup

ColumnExists

Check if a column exists in a table on a specified connection.

opdts.columnexists.ColumnExistsMeta

Check if file is locked

Lookup

FileLocked

Check if a file is locked by another process

opdts.filelocked.FileLockedMeta

Check if webservice is available

Lookup

WebServiceAvailable

Check if a webservice is available

opdts.webserviceavailable.WebServiceAvailableMeta

Clone row

Utility

CloneRow

Clone a row as many times as needed

opdts.clonerow.CloneRowMeta

Closure Generator

Transform

ClosureGenerator

This step allows you to generates a closure table using parent-child relationships.

opdts.closure.ClosureGeneratorMeta

Combination lookup/update

Data Warehouse

CombinationLookup

Update a junk dimension in a data warehouse. Alternatively, look up information in this dimension. The primary key of a junk dimension are all the fields.

opdts.combinationlookup.CombinationLookupMeta

Concat Fields

Transform

ConcatFields

The Concat Fields step is used to concatenate multiple fields into one target field. The fields can be separated by a separator and the enclosure logic is completely compatible with the Text File Output step.

opdts.concatfields.ConcatFieldsMeta

Copy rows to result

Job

RowsToResult

Use this step to write rows to the executing job. The information will then be passed to the next entry in this job.

opdts.rowstoresult.RowsToResultMeta

CouchDB Input

Big Data

CouchDbInput

Retrieves all documents from a given view in a given design document from a given database

opdts.couchdbinput.CouchDbInputMeta

Credit card validator

Validation

CreditCardValidator

The Credit card validator step will help you tell: (1) if a credit card number is valid (uses LUHN10 (MOD-10) algorithm) (2) which credit card vendor handles that number (VISA, MasterCard, Diners Club, EnRoute, American Express (AMEX),...)

opdts.creditcardvalidator.CreditCardValidatorMeta

CSV file input

Input

CsvInput

Simple CSV file input

opdts.csvinput.CsvInputMeta

Data Grid

Input

DataGrid

Enter rows of static data in a grid, usually for testing, reference or demo purpose

opdts.datagrid.DataGridMeta

Data Validator

Validation

Validator

Validates passing data based on a set of rules

opdts.validator.ValidatorMeta

Database join

Lookup

DBJoin

Execute a database query using stream values as parameters

opdts.databasejoin.DatabaseJoinMeta

Database lookup

Lookup

DBLookup

Look up values in a database using field values

opdts.databaselookup.DatabaseLookupMeta

De-serialize from file

Input

CubeInput

Read rows of data from a data cube.

opdts.cubeinput.CubeInputMeta

Delay row

Utility

Delay

Output each input row after a delay

opdts.delay.DelayMeta

Delete

Output

Delete

Delete data in a database table based upon keys

opdts.delete.DeleteMeta

Detect empty stream

Flow

DetectEmptyStream

This step will output one empty row if input stream is empty (ie when input stream does not contain any row)

opdts.detectemptystream.DetectEmptyStreamMeta

Dimension lookup/update

Data Warehouse

DimensionLookup

Update a slowly changing dimension in a data warehouse. Alternatively, look up information in this dimension.

opdts.dimensionlookup.DimensionLookupMeta

Dummy (do nothing)

Flow

Dummy

This step type doesn't do anything. It's useful however when testing things or in certain situations where you want to split streams.

opdts.dummytrans.DummyTransMeta

Dynamic SQL row

Lookup

DynamicSQLRow

Execute dynamic SQL statement build in a previous field

opdts.dynamicsqlrow.DynamicSQLRowMeta

Edi to XML

Utility

TypeExitEdi2XmlStep

Converts an Edifact message to XML to simplify data extraction (Available in PDI 4.4, already present in CI trunk builds)

opdts.edi2xml.Edi2XmlMeta

ElasticSearch Bulk Insert

Bulk loading

ElasticSearchBulk

Performs bulk inserts into ElasticSearch

opdts.elasticsearchbulk.ElasticSearchBulkMeta

Email messages input

Input

MailInput

Read POP3/IMAP server and retrieve messages

opdts.mailinput.MailInputMeta

ESRI Shapefile Reader

Input

ShapeFileReader

Reads shape file data from an ESRI shape file and linked DBF file

org.pentaho.di.shapefilereader.ShapeFileReaderMeta

ETL Metadata Injection

Flow

MetaInject

This step allows you to inject metadata into an existing transformation prior to execution. This allows for the creation of dynamic and highly flexible data integration solutions.

opdts.metainject.MetaInjectMeta

Example plugin

Deprecated

 

 

 

Execute a process

Utility

ExecProcess

Execute a process and return the result

opdts.execprocess.ExecProcessMeta

Execute row SQL script

Scripting

ExecSQLRow

Execute SQL script extracted from a field created in a previous step.

opdts.execsqlrow.ExecSQLRowMeta

Execute SQL script

Scripting

ExecSQL

Execute an SQL script, optionally parameterized using input rows

opdts.sql.ExecSQLMeta

File exists

Lookup

FileExists

Check if a file exists

opdts.fileexists.FileExistsMeta

Filter Rows

Flow

FilterRows

Filter rows using simple equations

opdts.filterrows.FilterRowsMeta

Fixed file input

Input

FixedInput

Fixed file input

opdts.fixedinput.FixedInputMeta

Formula

Scripting

Formula

Calculate a formula using Pentaho's libformula

opdts.formula.FormulaMeta

Fuzzy match

Lookup

FuzzyMatch

Finding approximate matches to a string using matching algorithms. Read a field from a main stream and output approximative value from lookup stream.

opdts.fuzzymatch.FuzzyMatchMeta

Generate random credit card numbers

Input

RandomCCNumberGenerator

Generate random valide (luhn check) credit card numbers

opdts.randomccnumber.RandomCCNumberGeneratorMeta

Generate random value

Input

RandomValue

Generate random value

opdts.randomvalue.RandomValueMeta

Generate Rows

Input

RowGenerator

Generate a number of empty or equal rows.

opdts.rowgenerator.RowGeneratorMeta

Get data from XML

Input

getXMLData

Get data from XML file by using XPath. This step also allows you to parse XML defined in a previous field.

opdts.getxmldata.GetXMLDataMeta

Get File Names

Input

GetFileNames

Get file names from the operating system and send them to the next step.

opdts.getfilenames.GetFileNamesMeta

Get files from result

Job

FilesFromResult

This step allows you to read filenames used or generated in a previous entry in a job.

opdts.filesfromresult.FilesFromResultMeta

Get Files Rows Count

Input

GetFilesRowsCount

Get Files Rows Count

opdts.getfilesrowscount.GetFilesRowsCountMeta

Get ID from slave server

Transform

GetSlaveSequence

Retrieves unique IDs in blocks from a slave server. The referenced sequence needs to be configured on the slave server in the XML configuration file.

opdts.getslavesequence.GetSlaveSequenceMeta

Get previous row fields

Deprecated

 

 

 

Get repository names

Input

GetRepositoryNames

Lists detailed information about transformations and/or jobs in a repository

opdts.getrepositorynames.GetRepositoryNamesMeta

Get rows from result

Job

RowsFromResult

This allows you to read rows from a previous entry in a job

opdts.rowsfromresult.RowsFromResultMeta

Get Session Variables

BA Server

GetSessionVariableStep

Retrieves the value of a session variable

org.pentaho.di.baserver.utils.GetSessionVariableMeta

Get SubFolder names

Input

GetSubFolders

Read a parent folder and return all subfolders

opdts.getsubfolders.GetSubFoldersMeta

Get System Info

Input

SystemInfo

Get information from the system like system date, arguments, etc.

opdts.systemdata.SystemDataMeta

Get table names

Input

GetTableNames

Get table names from database connection and send them to the next step

opdts.gettablenames.GetTableNamesMeta

Get Variables

Job

GetVariable

Determine the values of certain (environment or Kettle) variables and put them in field values.

opdts.getvariable.GetVariableMeta

Google Analytics

Input

TypeExitGoogleAnalyticsInputStep

Fetches data from google analytics account

opdts.googleanalytics.GaInputStepMeta

Google Docs Input

Input

 

 

 

Greenplum Bulk Loader

Bulk loading

GPBulkLoader

Greenplum Bulk Loader

opdts.gpbulkloader.GPBulkLoaderMeta

Greenplum Load

Bulk loading

GPLoad

Greenplum Load

 

Group by

Statistics

GroupBy

Builds aggregates in a group by fashion. This works only on a sorted input. If the input is not sorted, only double consecutive rows are handled correctly.

opdts.groupby.GroupByMeta

GZIP CSV Input

Input

ParallelGzipCsvInput

Parallel GZIP CSV file input reader

opdts.parallelgzipcsv.ParGzipCsvInputMeta

Hadoop File Input

Big Data

HadoopFileInputPlugin

Read data from a variety of different text-file types stored on a Hadoop cluster

opdts.hadoopfileinput.HadoopFileInputMeta

Hadoop File Output

Big Data

HadoopFileOutputPlugin

Write data to a variety of different text-file types stored on a Hadoop cluster

opdts.hadoopfileoutput.HadoopFileOutputMeta

HBase input

Big Data

HbaseInput

Read from an HBase column family

opdts.hbaseinput.HBaseInputMeta

HBase output

Big Data

HbaseOutput

Write to an HBase column family

opdts.hbaseoutput.HBaseOutputMeta

HBase Row Decoder

Big Data

HBaseRowDecoder

Decodes an incoming key and HBase result object according to a mapping

opdts.hbaserowdecoder.HBaseRowDecoderMeta

HL7 Input

Input

HL7Input

Read data from HL7 data streams.

opdt.hl7.plugins.hl7input

HTTP client

Lookup

HTTP

Call a web service over HTTP by supplying a base URL by allowing parameters to be set dynamically

opdts.http.HTTPMeta

HTTP Post

Lookup

HTTPPOST

Call a web service request over HTTP by supplying a base URL by allowing parameters to be set dynamically

opdts.httppost.HTTPPOSTMeta

IBM Websphere MQ Consumer

Input

MQInput

Receive messages from any IBM Websphere MQ Server

 

IBM Websphere MQ Producer

Output

MQOutput

Send messages to any IBM Websphere MQ Server

 

Identify last row in a stream

Flow

DetectLastRow

Last row will be marked

opdts.detectlastrow.DetectLastRowMeta

If field value is null

Utility

IfNull

Sets a field value to a constant if it is null.

opdts.ifnull.IfNullMeta

Infobright Loader

Bulk loading

InfobrightOutput

Load data to an Infobright database table

opdts.infobrightoutput.InfobrightLoaderMeta

Ingres VectorWise Bulk Loader

Bulk loading

VectorWiseBulkLoader

This step interfaces with the Ingres VectorWise Bulk Loader "COPY TABLE" command.

opdts.ivwloader.IngresVectorwiseLoaderMeta

Injector

Inline

Injector

Injector step to allow to inject rows into the transformation through the java API

opdts.injector.InjectorMeta

Insert / Update

Output

InsertUpdate

Update or insert rows in a database based upon keys.

opdts.insertupdate.InsertUpdateMeta

Java Filter

Flow

JavaFilter

Filter rows using java code

opdts.javafilter.JavaFilterMeta

JMS Consumer

Input

JmsInput

Receive messages from a JMS server

 

JMS Producer

Output

JmsOutput

Send messages to a JMS server

 

Job Executor

Flow

JobExecutor

This step executes a Pentaho Data Integration Job, passes parameters and rows.

opdts.jobexecutor.JobExecutorMeta

Join Rows (cartesian product)

Joins

JoinRows

The output of this step is the cartesian product of the input streams. The number of rows is the multiplication of the number of rows in the input streams.

opdts.joinrows.JoinRowsMeta

Json Input

Input

JsonInput

Extract relevant portions out of JSON structures (file or incoming field) and output rows

opdts.jsoninput.JsonInputMeta

JSON output

Output

JsonOutput

Create Json bloc and output it in a field ou a file.

opdts.jsonoutput.JsonOutputMeta

Knowledge Flow

Data Mining

KF

Executes a Knowledge Flow data mining process

org.pentaho.di.kf.KFMeta

LDAP Input

Input

LDAPInput

Read data from LDAP host

opdts.ldapinput.LDAPInputMeta

LDAP Output

Output

LDAPOutput

Perform Insert, upsert, update, add or delete operations on records based on their DN (Distinguished Name).

opdts.ldapoutput.LDAPOutputMeta

LDIF Input

Input

LDIFInput

Read data from LDIF files

opdts.ldifinput.LDIFInputMeta

Load file content in memory

Input

LoadFileInput

Load file content in memory

opdts.loadfileinput.LoadFileInputMeta

LucidDB Bulk Loader

Deprecated

 

 

 

LucidDB Streaming Loader

Bulk loading

LucidDBStreamingLoader

Load data into LucidDB by using Remote Rows UDX.

opdts.luciddbstreamingloader.LucidDBStreamingLoaderMeta

Mail

Utility

Mail

Send eMail.

opdts.mail.MailMeta

Mail Validator

Validation

MailValidator

Check if an email address is valid.

opdts.mailvalidator.MailValidatorMeta

Mapping (sub-transformation)

Mapping

Mapping

Run a mapping (sub-transformation), use MappingInput and MappingOutput to specify the fields interface

opdts.mapping.MappingMeta

Mapping input specification

Mapping

MappingInput

Specify the input interface of a mapping

opdts.mappinginput.MappingInputMeta

Mapping output specification

Mapping

MappingOutput

Specify the output interface of a mapping

opdts.mappingoutput.MappingOutputMeta

MapReduce Input

Big Data

HadoopEnterPlugin

Key Value pairs enter here from Hadoop MapReduce

opdts.hadoopenter.HadoopEnterMeta

MapReduce Output

Big Data

HadoopExitPlugin

Key Value pairs exit here and are pushed into Hadoop MapReduce

opdts.hadoopexit.HadoopExitMeta

MaxMind GeoIP Lookup

Lookup

MaxMindGeoIPLookup

Lookup an IPv4 address in a MaxMind database and add fields such as geography, ISP, or organization.

com.maxmind.geoip.MaxMindGeoIPLookupMeta

Memory Group by

Statistics

MemoryGroupBy

Builds aggregates in a group by fashion. This step doesn't require sorted input.

opdts.memgroupby.MemoryGroupByMeta

Merge Join

Joins

MergeJoin

Joins two streams on a given key and outputs a joined set. The input streams must be sorted on the join key

opdts.mergejoin.MergeJoinMeta

Merge Rows (diff)

Joins

MergeRows

Merge two streams of rows, sorted on a certain key. The two streams are compared and the equals, changed, deleted and new rows are flagged.

opdts.mergerows.MergeRowsMeta

Metadata structure of stream

Utility

StepMetastructure

This is a step to read the metadata of the incoming stream.

opdts.stepmeta.StepMetastructureMeta

Microsoft Access Input

Input

AccessInput

Read data from a Microsoft Access file

opdts.accessinput.AccessInputMeta

Microsoft Access Output

Output

AccessOutput

Stores records into an MS-Access database table.

opdts.accessoutput.AccessOutputMeta

Microsoft Excel Input

Input

ExcelInput

Read data from Excel and OpenOffice Workbooks (XLS, XLSX, ODS).

opdts.excelinput.ExcelInputMeta

Microsoft Excel Output

Output

ExcelOutput

Stores records into an Excel (XLS) document with formatting information.

opdts.exceloutput.ExcelOutputMeta

Microsoft Excel Writer

Output

TypeExitExcelWriterStep

Writes or appends data to an Excel file

opdts.excelwriter.ExcelWriterStepMeta

Modified Java Script Value

Scripting

ScriptValueMod

This steps allows the execution of JavaScript programs (and much more)

opdts.scriptvalues_mod.ScriptValuesMetaMod

Mondrian Input

Input

MondrianInput

Execute and retrieve data using an MDX query against a Pentaho Analyses OLAP server (Mondrian)

opdts.mondrianinput.MondrianInputMeta

MonetDB Agile Mart

Agile

 

 

 

MonetDB Bulk Loader

Bulk loading

MonetDBBulkLoader

Load data into MonetDB by using their bulk load command in streaming mode.

opdts.monetdbbulkloader.MonetDBBulkLoaderMeta

MongoDB Input

Big Data

MongoDbInput

Reads all entries from a MongoDB collection in the specified database.

opdts.mongodbinput.MongoDbInputMeta

MongoDB Output

Big Data

MongoDbOutput

Write to a MongoDB collection.

opdts.mongodboutput.MongoDbOutputMeta

Multiway Merge Join

Joins

MultiwayMergeJoin

Multiway Merge Join

opdts.multimerge.MultiMergeJoinMeta

MySQL Bulk Loader

Bulk loading

MySQLBulkLoader

MySQL bulk loader step, loading data over a named pipe (not available on MS Windows)

opdts.mysqlbulkloader.MySQLBulkLoaderMeta

Null if...

Utility

NullIf

Sets a field value to null if it is equal to a constant value

opdts.nullif.NullIfMeta

Number range

Transform

NumberRange

Create ranges based on numeric field

opdts.numberrange.NumberRangeMeta

OLAP Input

Input

OlapInput

Execute and retrieve data using an MDX query against any XML/A OLAP datasource using olap4j

opdts.olapinput.OlapInputMeta

OpenERP Object Delete

Delete

OpenERPObjectDelete

Deletes data from the OpenERP server using the XMLRPC interface with the 'unlink' function.

opdts.openerp.objectdelete.OpenERPObjectDeleteMeta

OpenERP Object Input

Input

OpenERPObjectInput

Retrieves data from the OpenERP server using the XMLRPC interface with the 'read' function.

opdts.openerp.objectinput.OpenERPObjectInputMeta

OpenERP Object Output

Output

OpenERPObjectOutputImport

Updates data on the OpenERP server using the XMLRPC interface and the 'import' function

opdts.openerp.objectoutput.OpenERPObjectOutputMeta

Oracle Bulk Loader

Bulk loading

OraBulkLoader

Use Oracle Bulk Loader to load data

opdts.orabulkloader.OraBulkLoaderMeta

Output steps metrics

Statistics

StepsMetrics

Return metrics for one or several steps

opdts.stepsmetrics.StepsMetricsMeta

Palo Cell Input

Input

PaloCellInput

Retrieves all cell data from a Palo cube

opdts.palo.cellinput

Palo Cell Output

Output

PaloCellOutput

Updates cell data in a Palo cube

opdts.palo.celloutput

Palo Dimension Input

Input

PaloDimInput

Returns elements from a dimension in a Palo database

opdts.palo.diminput

Palo Dimension Output

Output

PaloDimOutput

Creates/updates dimension elements and element consolidations in a Palo database

opdts.palo.dimoutput

Pentaho Reporting Output

Output

PentahoReportingOutput

Executes an existing report (PRPT)

opdts.pentahoreporting.PentahoReportingOutputMeta

PostgreSQL Bulk Loader

Bulk loading

PGBulkLoader

PostgreSQL Bulk Loader

opdts.pgbulkloader.PGBulkLoaderMeta

Prioritize streams

Flow

PrioritizeStreams

Prioritize streams in an order way.

opdts.prioritizestreams.PrioritizeStreamsMeta

Process files

Utility

ProcessFiles

Process one file per row (copy or move or delete). This step only accept filename in input.

opdts.processfiles.ProcessFilesMeta

Properties Output

Output

PropertyOutput

Write data to properties file

opdts.propertyoutput.PropertyOutputMeta

Property Input

Input

PropertyInput

Read data (key, value) from properties files.

opdts.propertyinput.PropertyInputMeta

R script executor

Statistics

RScriptExecutor

Executes an R script within a PDI transformation

 

Regex Evaluation

Scripting

RegexEval

Regular expression Evaluation. This step uses a regular expression to evaluate a field. It can also extract new fields out of an existing field with capturing groups.

opdts.regexeval.RegexEvalMeta

Replace in string

Transform

ReplaceString

Replace all occurences a word in a string with another word.

opdts.replacestring.ReplaceStringMeta

Reservoir Sampling

Statistics

ReservoirSampling

Transform Samples a fixed number of rows from the incoming stream

opdts.reservoirsampling.ReservoirSamplingMeta

REST Client

Lookup

Rest

Consume RESTfull services. REpresentational State Transfer (REST) is a key design idiom that embraces a stateless client-server architecture in which the web services are viewed as resources and can be identified by their URLs

opdts.rest.RestMeta

Row denormaliser

Transform

Denormaliser

Denormalises rows by looking up key-value pairs and by assigning them to new fields in the output rows. This method aggregates and needs the input rows to be sorted on the grouping fields

opdts.denormaliser.DenormaliserMeta

Row flattener

Transform

Flattener

Flattens consecutive rows based on the order in which they appear in the input stream

opdts.flattener.FlattenerMeta

Row Normaliser

Transform

Normaliser

De-normalised information can be normalised using this step type.

opdts.normaliser.NormaliserMeta

RSS Input

Input

RssInput

Read RSS feeds

opdts.rssinput.RssInputMeta

RSS Output

Output

RssOutput

Read RSS stream.

opdts.rssoutput.RssOutputMeta

Rule Executor

Scripting

RuleExecutor

Execute a rule against each row (using Drools)

opdts.rules.RulesExecutorMeta

Rule Accumulator

Scripting

RuleAccumulator

Execute a rule against a set of rows (using Drools)

opdts.rules.RulesAccumulatorMeta

Run SSH commands

Utility

SSH

Run SSH commands and returns result.

opdts.ssh.SSHMeta

S3 CSV Input

Input

S3CSVINPUT

S3 CSV Input

opdts.s3csvinput.S3CsvInputMeta

S3 File Output

Output

S3FileOutputPlugin

Exports data to a text file on an Amazon Simple Storage Service (S3)

com.pentaho.amazon.s3.S3FileOutputMeta

SAP HANA Bulk Loader

Bulk loading

HanaBulkLoader

Bulk load data into SAP HANA

org.pentaho.di.trans.steps.hanabulkloader.HanaBulkLoaderMeta

Salesforce Delete

Output

SalesforceDelete

Delete records in Salesforce module.

opdts.salesforcedelete.SalesforceDeleteMeta

Salesforce Input

Input

SalesforceInput

Reads information from SalesForce

opdts.salesforceinput.SalesforceInputMeta

Salesforce Insert

Output

SalesforceInsert

Insert records in Salesforce module.

opdts.salesforceinsert.SalesforceInsertMeta

Salesforce Update

Output

SalesforceUpdate

Update records in Salesforce module.

opdts.salesforceupdate.SalesforceUpdateMeta

Salesforce Upsert

Output

SalesforceUpsert

Insert or update records in Salesforce module.

opdts.salesforceupsert.SalesforceUpsertMeta

Sample rows

Statistics

SampleRows

Filter rows based on the line number.

opdts.samplerows.SampleRowsMeta

SAP Input

Input

SapInput

Read data from SAP ERP, optionally with parameters

opdts.sapinput.SapInputMeta

SAS Input

Input

SASInput

This step reads files in sas7bdat (SAS) native format

opdts.sasinput.SasInputMeta

Script

Experimental

 

 

 

Secret key generator

Cryptography

SecretKeyGenerator

Generate secrete key for algorithms such as DES, AES, TripleDES.

opdts.symmetriccrypto.secretkeygenerator.SecretKeyGeneratorMeta

Select values

Transform

SelectValues

Select or remove fields in a row. Optionally, set the field meta-data: type, length and precision.

opdts.selectvalues.SelectValuesMeta

Send message to Syslog

Utility

SyslogMessage

Send message to Syslog server

opdts.syslog.SyslogMessageMeta

Serialize to file

Output

CubeOutput

Write rows of data to a data cube

opdts.cubeoutput.CubeOutputMeta

Set field value

Transform

SetValueField

Replace value of a field with another value field

opdts.setvaluefield.SetValueFieldMeta

Set field value to a constant

Transform

SetValueConstant

Replace value of a field to a constant

opdts.setvalueconstant.SetValueConstantMeta

Set files in result

Job

FilesToResult

This step allows you to set filenames in the result of this transformation. Subsequent job entries can then use this information.

opdts.filestoresult.FilesToResultMeta

Set Session Variables

BA Server

SetSessionVariableStep

Allows you to set the value of session variable

org.pentaho.di.baserver.utils.SetSessionVariableMeta

Set Variables

Job

SetVariable

Set environment variables based on a single input row.

opdts.setvariable.SetVariableMeta

SFTP Put

Experimental

 

 

 

Simple Mapping

Mapping

SimpleMapping

Turn a repetitive, re-usable part of a transformation (a sequence of steps) into a mapping (sub-transformation).

opdts.simplemapping.SimpleMapping

Single Threader

Flow

SingleThreader

Executes a transformation snippet in a single thread. You need a standard mapping or a transformation with an Injector step where data from the parent transformation will arive in blocks.

opdts.singlethreader.SingleThreaderMeta

Socket reader

Inline

SocketReader

Socket reader. A socket client that connects to a server (Socket Writer step).

opdts.socketreader.SocketReaderMeta

Socket writer

Inline

SocketWriter

Socket writer. A socket server that can send rows of data to a socket reader.

opdts.socketwriter.SocketWriterMeta

Sort rows

Transform

SortRows

Sort rows based upon field values (ascending or descending)

opdts.sort.SortRowsMeta

Sorted Merge

Joins

SortedMerge

Sorted Merge

opdts.sortedmerge.SortedMergeMeta

Split field to rows

Transform

SplitFieldToRows3

Splits a single string field by delimiter and creates a new row for each split term

opdts.splitfieldtorows.SplitFieldToRowsMeta

Split Fields

Transform

FieldSplitter

When you want to split a single field into more then one, use this step type.

opdts.fieldsplitter.FieldSplitterMeta

Splunk Input

Transform

SplunkInput

Reads data from Splunk.

opdts.splunk.SplunkInputMeta

Splunk Output

Transform

SplunkOutput

Writes data to Splunk.

opdts.splunk.SplunkOutputMeta

SQL File Output

Output

SQLFileOutput

Output SQL INSERT statements to file

opdts.sqlfileoutput.SQLFileOutputMeta

Stream lookup

Lookup

StreamLookup

Look up values coming from another stream in the transformation.

opdts.streamlookup.StreamLookupMeta

SSTable Output

Big Data

SSTableOutput

writes to a filesystem directory as a Cassandra SSTable

opdts.cassandrasstableoutput.SSTableOutputMeta

Streaming XML Input

Deprecated

 

 

 

String operations

Transform

StringOperations

Apply certain operations like trimming, padding and others to string value.

opdts.stringoperations.StringOperationsMeta

Strings cut

Transform

StringCut

Strings cut (substring).

opdts.stringcut.StringCutMeta

Switch / Case

Flow

SwitchCase

Switch a row to a certain target step based on the case value in a field.

opdts.switchcase.SwitchCaseMeta

Symmetric Cryptography

Cryptography

SymmetricCryptoTrans

Encrypt or decrypt a string using symmetric encryption. Available algorithms are DES, AEC, TripleDES.

opdts.symmetriccrypto.symmetriccryptotrans.SymmetricCryptoTransMeta

Synchronize after merge

Output

SynchronizeAfterMerge

This step perform insert/update/delete in one go based on the value of a field.

opdts.synchronizeaftermerge.SynchronizeAfterMergeMeta

Table Agile Mart

Agile

 

 

 

Table Compare

Utility

TableCompare

This step compares the data from two tables (provided they have the same lay-out). It'll find differences between the data in the two tables and log it.

opdts.tablecompare.TableCompareMeta

Table exists

Lookup

TableExists

Check if a table exists on a specified connection

opdts.tableexists.TableExistsMeta

Table input

Input

TableInput

Read information from a database table.

opdts.tableinput.TableInputMeta

Table output

Output

TableOutput

Write information to a database table

opdts.tableoutput.TableOutputMeta

Teradata Fastload Bulk Loader

Bulk loading

TeraFast

The Teradata Fastload Bulk loader

opdts.terafast.TeraFastMeta

Teradata TPT Insert Upsert Bulk Loader

Bulk loading

TeraDataBulkLoader

Bulk loading via TPT using the tbuild command.

 

Text file input

Input

TextFileInput

Read data from a text file in several formats. This data can then be passed on to the next step(s)...

opdts.textfileinput.TextFileInputMeta

Text file output

Output

TextFileOutput

Write rows to a text file.

opdts.textfileoutput.TextFileOutputMeta

Transformation Executor

Flow

 

This step executes a Pentaho Data Integration transformation, sets parameters, and passes rows.

 

Unique rows

Transform

Unique

Remove double rows and leave only unique occurrences. This works only on a sorted input. If the input is not sorted, only double consecutive rows are handled correctly.

opdts.uniquerows.UniqueRowsMeta

Unique rows (HashSet)

Transform

UniqueRowsByHashSet

Remove double rows and leave only unique occurrences by using a HashSet.

opdts.uniquerowsbyhashset.UniqueRowsByHashSetMeta

Univariate Statistics

Statistics

UnivariateStats

This step computes some simple stats based on a single input field

opdts.univariatestats.UnivariateStatsMeta

Update

Output

Update

Update data in a database table based upon keys

opdts.update.UpdateMeta

User Defined Java Class

Scripting

UserDefinedJavaClass

This step allows you to program a step using Java code

opdts.userdefinedjavaclass.UserDefinedJavaClassMeta

User Defined Java Expression

Scripting

Janino

Calculate the result of a Java Expression using Janino

opdts.janino.JaninoMeta

Value Mapper

Transform

ValueMapper

Maps values of a certain field from one value to another

opdts.valuemapper.ValueMapperMeta

Vertica Bulk Loader

Bulk loading

VerticaBulkLoader

Bulk loads data into a Vertica table using their high performance COPY feature

opdts.verticabulkload.VerticaBulkLoaderMeta

Web services lookup

Lookup

WebServiceLookup

Look up information using web services (WSDL)

opdts.webservices.WebServiceMeta

Knowledge Flow

Data Mining

KF

Executes a Knowledge Flow data mining process

org.pentaho.di.kf.KFMeta

Write to log

Utility

WriteToLog

Write data to log

opdts.writetolog.WriteToLogMeta

XBase input

Input

XBaseInput

Reads records from an XBase type of database file (DBF)

opdts.xbaseinput.XBaseInputMeta

XML Input Stream (StAX)

Input

XMLInputStream

This step is capable of processing very large and complex XML files very fast.

opdts.xmlinputstream.XMLInputStreamMeta

XML Input

Deprecated

 

 

 

XML Join

Joins

XMLJoin

Joins a stream of XML-Tags into a target XML string

opdts.xmljoin.XMLJoinMeta

XML Output

Output

XMLOutput

Write data to an XML file

opdts.xmloutput.XMLOutputMeta

XSD Validator

Validation

XSDValidator

Validate XML source (files or streams) against XML Schema Definition.

opdts.xsdvalidator.XsdValidatorMeta

XSL Transformation

Transform

XSLT

Transform XML stream using XSL (eXtensible Stylesheet Language).

opdts.xslt.XsltMeta

Yaml Input

Input

YamlInput

Read YAML source (file or stream) parse them and convert them to rows and writes these to one or more output.

opdts.yamlinput.YamlInputMeta

Zip File

Utility

ZipFile

Creates a standard ZIP archive from the data stream fields

opdts.zipfile.ZipFileMeta