Access Keys:
Skip to content (Access Key - 0)


This page contains the index for the documentation on all the standard steps in Pentaho Data Integration.
We invite everyone to add more details, tips and samples to the step pages.

Name Category
ID Description Metadata Java class
opdts = org.pentaho.di.trans.steps
Abort Flow Abort Abort a transformation opdts.abort.AbortMeta
Add a checksum Transform CheckSum Add a checksum column for each input row opdts.checksum.CheckSumMeta
Add constants Transform Constant Add one or more constants to the input rows opdts.constant.ConstantMeta
Add sequence Transform Sequence Get the next value from an sequence opdts.addsequence.AddSequenceMeta
Add value fields changing sequence Transform FieldsChangeSequence Add sequence depending of fields value change. Each time value of at least one field change, PDI will reset sequence. opdts.fieldschangesequence.FieldsChangeSequenceMeta
Add XML Transform AddXML Encode several fields into an XML fragment opdts.addxml.AddXMLMeta
Aggregate Rows
Analytic Query Statistics AnalyticQuery Execute analytic queries over a sorted dataset (LEAD/LAG/FIRST/LAST) opdts.analyticquery.AnalyticQueryMeta
Append streams Flow Append Append 2 streams in an ordered way opdts.append.AppendMeta
ARFF Output Data Mining Arff Output Writes data in ARFF format to a file opdts.append.arff.ArffOutputMeta
Automatic Documentation Output Output AutoDoc This step automatically generates documentation based on input in the form of a list of transformations and jobs opdts.autodoc.AutoDocMeta
Avro input Input AvroInput Decode binary or Json Avro data from a file or a field opdts.avroinput.AvroInputMeta
Block this step until steps finish Flow BlockUntilStepsFinish Block this step until selected steps finish. opdts.blockuntilstepsfinish.BlockUntilStepsFinishMeta
Blocking Step Flow BlockingStep This step blocks until all incoming rows have been processed. Subsequent steps only recieve the last input row to this step. opdts.blockingstep.BlockingStepMeta
Calculator Transform Calculator Create new fields by performing simple calculations opdts.calculator.CalculatorMeta
Call DB Procedure Lookup DBProc Get back information by calling a database procedure. opdts.dbproc.DBProcMeta
Change file encoding Utility ChangeFileEncoding Change file encoding and create a new file opdts.changefileencoding.ChangeFileEncodingMeta
Cassandra input Big Data CassandraInput Read from a Cassandra column family opdts.cassandrainput.CassandraInputMeta
Cassandra output Big Data CassandraOutput Write to a Cassandra column family opdts.cassandraoutput.CassandraOutputMeta
Check if a column exists Lookup ColumnExists Check if a column exists in a table on a specified connection. opdts.columnexists.ColumnExistsMeta
Check if file is locked Lookup FileLocked Check if a file is locked by another process opdts.filelocked.FileLockedMeta
Check if webservice is available Lookup WebServiceAvailable Check if a webservice is available opdts.webserviceavailable.WebServiceAvailableMeta
Clone row Utility CloneRow Clone a row as many times as needed opdts.clonerow.CloneRowMeta
Closure Generator Transform ClosureGenerator This step allows you to generates a closure table using parent-child relationships. opdts.closure.ClosureGeneratorMeta
Combination lookup/update Data Warehouse CombinationLookup Update a junk dimension in a data warehouse. Alternatively, look up information in this dimension. The primary key of a junk dimension are all the fields. opdts.combinationlookup.CombinationLookupMeta
Concat Fields
Transform ConcatFields
The Concat Fields step is used to concatenate multiple fields into one target field. The fields can be separated by a separator and the enclosure logic is completely compatible with the Text File Output step. opdts.concatfields.ConcatFieldsMeta
Copy rows to result Job RowsToResult Use this step to write rows to the executing job. The information will then be passed to the next entry in this job. opdts.rowstoresult.RowsToResultMeta
CouchDB Input Big Data CouchDbInput Retrieves all documents from a given view in a given design document from a given database opdts.couchdbinput.CouchDbInputMeta
Credit card validator Validation CreditCardValidator The Credit card validator step will help you tell: (1) if a credit card number is valid (uses LUHN10 (MOD-10) algorithm) (2) which credit card vendor handles that number (VISA, MasterCard, Diners Club, EnRoute, American Express (AMEX),...) opdts.creditcardvalidator.CreditCardValidatorMeta
CSV file input Input CsvInput Simple CSV file input opdts.csvinput.CsvInputMeta
Data Grid Input DataGrid Enter rows of static data in a grid, usually for testing, reference or demo purpose opdts.datagrid.DataGridMeta
Data Validator Validation Validator Validates passing data based on a set of rules opdts.validator.ValidatorMeta
Database join Lookup DBJoin Execute a database query using stream values as parameters opdts.databasejoin.DatabaseJoinMeta
Database lookup Lookup DBLookup Look up values in a database using field values opdts.databaselookup.DatabaseLookupMeta
De-serialize from file Input CubeInput Read rows of data from a data cube. opdts.cubeinput.CubeInputMeta
Delay row Utility Delay Output each input row after a delay opdts.delay.DelayMeta
Delete Output Delete Delete data in a database table based upon keys opdts.delete.DeleteMeta
Detect empty stream Flow DetectEmptyStream This step will output one empty row if input stream is empty (ie when input stream does not contain any row) opdts.detectemptystream.DetectEmptyStreamMeta
Dimension lookup/update Data Warehouse DimensionLookup Update a slowly changing dimension in a data warehouse. Alternatively, look up information in this dimension. opdts.dimensionlookup.DimensionLookupMeta
Dummy (do nothing) Flow Dummy This step type doesn't do anything. It's useful however when testing things or in certain situations where you want to split streams. opdts.dummytrans.DummyTransMeta
Dynamic SQL row Lookup DynamicSQLRow Execute dynamic SQL statement build in a previous field opdts.dynamicsqlrow.DynamicSQLRowMeta
Edi to XML
Utility TypeExitEdi2XmlStep Converts an Edifact message to XML to simplify data extraction (Available in PDI 4.4, already present in CI trunk builds) opdts.edi2xml.Edi2XmlMeta
ElasticSearch Bulk Insert Bulk loading ElasticSearchBulk Performs bulk inserts into ElasticSearch opdts.elasticsearchbulk.ElasticSearchBulkMeta
Email messages input Input MailInput Read POP3/IMAP server and retrieve messages opdts.mailinput.MailInputMeta
ESRI Shapefile Reader Input ShapeFileReader Reads shape file data from an ESRI shape file and linked DBF file org.pentaho.di.shapefilereader.ShapeFileReaderMeta
ETL Metadata Injection Flow MetaInject This step allows you to inject metadata into an existing transformation prior to execution. This allows for the creation of dynamic and highly flexible data integration solutions. opdts.metainject.MetaInjectMeta
Example plugin Transform DummyPlugin This is an example for a plugin test step be.ibridge.kettle.dummy.DummyPluginMeta
Execute a process Utility ExecProcess Execute a process and return the result opdts.execprocess.ExecProcessMeta
Execute row SQL script Scripting ExecSQLRow Execute SQL script extracted from a field created in a previous step. opdts.execsqlrow.ExecSQLRowMeta
Execute SQL script Scripting ExecSQL Execute an SQL script, optionally parameterized using input rows opdts.sql.ExecSQLMeta
File exists Lookup FileExists Check if a file exists opdts.fileexists.FileExistsMeta
Filter Rows Flow FilterRows Filter rows using simple equations opdts.filterrows.FilterRowsMeta
Fixed file input Input FixedInput Fixed file input opdts.fixedinput.FixedInputMeta
Formula Scripting Formula Calculate a formula using Pentaho's libformula opdts.formula.FormulaMeta
Fuzzy match Lookup FuzzyMatch Finding approximate matches to a string using matching algorithms. Read a field from a main stream and output approximative value from lookup stream. opdts.fuzzymatch.FuzzyMatchMeta
Generate random credit card numbers Input RandomCCNumberGenerator Generate random valide (luhn check) credit card numbers opdts.randomccnumber.RandomCCNumberGeneratorMeta
Generate random value Input RandomValue Generate random value opdts.randomvalue.RandomValueMeta
Generate Rows Input RowGenerator Generate a number of empty or equal rows. opdts.rowgenerator.RowGeneratorMeta
Get data from XML Input getXMLData Get data from XML file by using XPath. This step also allows you to parse XML defined in a previous field. opdts.getxmldata.GetXMLDataMeta
Get File Names Input GetFileNames Get file names from the operating system and send them to the next step. opdts.getfilenames.GetFileNamesMeta
Get files from result Job FilesFromResult This step allows you to read filenames used or generated in a previous entry in a job. opdts.filesfromresult.FilesFromResultMeta
Get Files Rows Count Input GetFilesRowsCount Get Files Rows Count opdts.getfilesrowscount.GetFilesRowsCountMeta
Get ID from slave server Transform GetSlaveSequence Retrieves unique IDs in blocks from a slave server. The referenced sequence needs to be configured on the slave server in the XML configuration file. opdts.getslavesequence.GetSlaveSequenceMeta
Get previous row fields
Get repository names Input GetRepositoryNames Lists detailed information about transformations and/or jobs in a repository opdts.getrepositorynames.GetRepositoryNamesMeta
Get rows from result Job RowsFromResult This allows you to read rows from a previous entry in a job. opdts.rowsfromresult.RowsFromResultMeta
Get SubFolder names Input GetSubFolders Read a parent folder and return all subfolders opdts.getsubfolders.GetSubFoldersMeta
Get System Info Input SystemInfo Get information from the system like system date, arguments, etc. opdts.systemdata.SystemDataMeta
Get table names Input GetTableNames Get table names from database connection and send them to the next step opdts.gettablenames.GetTableNamesMeta
Get Variables Job GetVariable Determine the values of certain (environment or Kettle) variables and put them in field values. opdts.getvariable.GetVariableMeta
Google Analytics Input TypeExitGoogleAnalyticsInputStep Fetches data from google analytics account opdts.googleanalytics.GaInputStepMeta
Google Docs Input
Greenplum Bulk Loader Bulk loading GPBulkLoader Greenplum Bulk Loader opdts.gpbulkloader.GPBulkLoaderMeta
Greenplum Load
Bulk loading
Greenplum Load
Group by Statistics GroupBy Builds aggregates in a group by fashion. This works only on a sorted input. If the input is not sorted, only double consecutive rows are handled correctly. opdts.groupby.GroupByMeta
GZIP CSV Input Input ParallelGzipCsvInput Parallel GZIP CSV file input reader opdts.parallelgzipcsv.ParGzipCsvInputMeta
Hadoop File Input Big Data HadoopFileInputPlugin Read data from a variety of different text-file types stored on a Hadoop cluster opdts.hadoopfileinput.HadoopFileInputMeta
Hadoop File Output Big Data HadoopFileOutputPlugin Write data to a variety of different text-file types stored on a Hadoop cluster opdts.hadoopfileoutput.HadoopFileOutputMeta
HBase input Big Data HbaseInput Read from an HBase column family opdts.hbaseinput.HBaseInputMeta
HBase output Big Data HbaseOutput Write to an HBase column family opdts.hbaseoutput.HBaseOutputMeta
HBase Row Decoder Big Data HBaseRowDecoder Decodes an incoming key and HBase result object according to a mapping opdts.hbaserowdecoder.HBaseRowDecoderMeta
HL7 Input
HL7Input Read data from HL7 data streams. opdt.hl7.plugins.hl7input
HTTP client Lookup HTTP Call a web service over HTTP by supplying a base URL by allowing parameters to be set dynamically opdts.http.HTTPMeta
HTTP Post Lookup HTTPPOST Call a web service request over HTTP by supplying a base URL by allowing parameters to be set dynamically opdts.httppost.HTTPPOSTMeta
IBM Websphere MQ Consumer Input
MQInput Receive messages from any IBM Websphere MQ Server  
IBM Websphere MQ Producer Output
MQOutput Send messages to any IBM Websphere MQ Server  
Identify last row in a stream Flow DetectLastRow Last row will be marked opdts.detectlastrow.DetectLastRowMeta
If field value is null Utility IfNull Sets a field value to a constant if it is null. opdts.ifnull.IfNullMeta
Infobright Loader Bulk loading InfobrightOutput Load data to an Infobright database table opdts.infobrightoutput.InfobrightLoaderMeta
Ingres VectorWise Bulk Loader Bulk loading VectorWiseBulkLoader This step interfaces with the Ingres VectorWise Bulk Loader "COPY TABLE" command. opdts.ivwloader.IngresVectorwiseLoaderMeta
Injector Inline Injector Injector step to allow to inject rows into the transformation through the java API opdts.injector.InjectorMeta
Insert / Update Output InsertUpdate Update or insert rows in a database based upon keys. opdts.insertupdate.InsertUpdateMeta
Java Filter Flow JavaFilter Filter rows using java code opdts.javafilter.JavaFilterMeta
JMS Consumer
JmsInput Receive messages from a JMS server  
JMS Producer
Send messages to a JMS server  
Job Executor
This step executes a Pentaho Data Integration Job, passes parameters and rows.
Join Rows (cartesian product)
Joins JoinRows The output of this step is the cartesian product of the input streams. The number of rows is the multiplication of the number of rows in the input streams. opdts.joinrows.JoinRowsMeta
JSON Input Input JsonInput Extract relevant portions out of JSON structures (file or incoming field) and output rows opdts.jsoninput.JsonInputMeta
JSON output Output JsonOutput Create Json bloc and output it in a field ou a file. opdts.jsonoutput.JsonOutputMeta
Knowledge Flow Data Mining KF Executes a Knowledge Flow data mining process org.pentaho.di.kf.KFMeta
LDAP Input Input LDAPInput Read data from LDAP host opdts.ldapinput.LDAPInputMeta
LDAP Output Output LDAPOutput Perform Insert, upsert, update, add or delete operations on records based on their DN (Distinguished Name). opdts.ldapoutput.LDAPOutputMeta
LDIF Input Input LDIFInput Read data from LDIF files opdts.ldifinput.LDIFInputMeta
Load file content in memory Input LoadFileInput Load file content in memory opdts.loadfileinput.LoadFileInputMeta
LucidDB Bulk Loader
LucidDB Streaming Loader Bulk loading LucidDBStreamingLoader Load data into LucidDB by using Remote Rows UDX. opdts.luciddbstreamingloader.LucidDBStreamingLoaderMeta
Mail Utility Mail Send eMail. opdts.mail.MailMeta
Mail Validator Validation MailValidator Check if an email address is valid. opdts.mailvalidator.MailValidatorMeta
Mapping (sub-transformation) Mapping Mapping Run a mapping (sub-transformation), use MappingInput and MappingOutput to specify the fields interface opdts.mapping.MappingMeta
Mapping input specification Mapping MappingInput Specify the input interface of a mapping opdts.mappinginput.MappingInputMeta
Mapping output specification Mapping MappingOutput Specify the output interface of a mapping opdts.mappingoutput.MappingOutputMeta
MapReduce Input Big Data HadoopEnterPlugin Key Value pairs enter here from Hadoop MapReduce opdts.hadoopenter.HadoopEnterMeta
MapReduce Output Big Data HadoopExitPlugin Key Value pairs exit here and are pushed into Hadoop MapReduce opdts.hadoopexit.HadoopExitMeta
MaxMind GeoIP Lookup Lookup MaxMindGeoIPLookup Lookup an IPv4 address in a MaxMind database and add fields such as geography, ISP, or organization. com.maxmind.geoip.MaxMindGeoIPLookupMeta
Memory Group by Statistics MemoryGroupBy Builds aggregates in a group by fashion. This step doesn't require sorted input. opdts.memgroupby.MemoryGroupByMeta
Merge Join Joins MergeJoin Joins two streams on a given key and outputs a joined set. The input streams must be sorted on the join key opdts.mergejoin.MergeJoinMeta
Merge Rows (diff) Joins MergeRows Merge two streams of rows, sorted on a certain key. The two streams are compared and the equals, changed, deleted and new rows are flagged. opdts.mergerows.MergeRowsMeta
Metadata structure of stream Utility StepMetastructure This is a step to read the metadata of the incoming stream. opdts.stepmeta.StepMetastructureMeta
Microsoft Access Input
Input AccessInput Read data from a Microsoft Access file opdts.accessinput.AccessInputMeta
Microsoft Access Output Output AccessOutput Stores records into an MS-Access database table. opdts.accessoutput.AccessOutputMeta
Microsoft Excel Input Input ExcelInput Read data from Excel and OpenOffice Workbooks (XLS, XLSX, ODS). opdts.excelinput.ExcelInputMeta
Microsoft Excel Output Output ExcelOutput Stores records into an Excel (XLS) document with formatting information. opdts.exceloutput.ExcelOutputMeta
Microsoft Excel Writer Output TypeExitExcelWriterStep Writes or appends data to an Excel file opdts.excelwriter.ExcelWriterStepMeta
Modified Java Script Value Scripting ScriptValueMod This steps allows the execution of JavaScript programs (and much more)
Mondrian Input
Input MondrianInput Execute and retrieve data using an MDX query against a Pentaho Analyses OLAP server (Mondrian) opdts.mondrianinput.MondrianInputMeta
MonetDB Agile Mart
MonetDB Bulk Loader Bulk loading MonetDBBulkLoader Load data into MonetDB by using their bulk load command in streaming mode. opdts.monetdbbulkloader.MonetDBBulkLoaderMeta
MongoDB Input Big Data MongoDbInput Reads all entries from a MongoDB collection in the specified database. opdts.mongodbinput.MongoDbInputMeta
MongoDB Output Big Data MongoDbOutput Write to a MongoDB collection. opdts.mongodboutput.MongoDbOutputMeta
Multiway Merge Join Joins MultiwayMergeJoin Multiway Merge Join opdts.multimerge.MultiMergeJoinMeta
MySQL Bulk Loader Bulk loading MySQLBulkLoader MySQL bulk loader step, loading data over a named pipe (not available on MS Windows) opdts.mysqlbulkloader.MySQLBulkLoaderMeta
Null if... Utility NullIf Sets a field value to null if it is equal to a constant value opdts.nullif.NullIfMeta
Number range Transform NumberRange Create ranges based on numeric field opdts.numberrange.NumberRangeMeta
OLAP Input Input OlapInput Execute and retrieve data using an MDX query against any XML/A OLAP datasource using olap4j opdts.olapinput.OlapInputMeta
OpenERP Object Delete
OpenERPObjectDelete Deletes data from the OpenERP server using the XMLRPC interface with the 'unlink' function.
OpenERP Object Input
Input OpenERPObjectInput Retrieves data from the OpenERP server using the XMLRPC interface with the 'read' function. opdts.openerp.objectinput.OpenERPObjectInputMeta
OpenERP Object Output
OpenERPObjectOutputImport Updates data on the OpenERP server using the XMLRPC interface and the 'import' function
Oracle Bulk Loader Bulk loading OraBulkLoader Use Oracle Bulk Loader to load data opdts.orabulkloader.OraBulkLoaderMeta
Output steps metrics Statistics StepsMetrics Return metrics for one or several steps opdts.stepsmetrics.StepsMetricsMeta
Palo Cell Input Input
PaloCellInput Retrieves all cell data from a Palo cube
Palo Cell Output
PaloCellOutput Updates cell data in a Palo cube
Palo Dimension Input
PaloDimInput Returns elements from a dimension in a Palo database
Palo Dimension Output
PaloDimOutput Creates/updates dimension elements and element consolidations in a Palo database
Pentaho Reporting Output Output PentahoReportingOutput Executes an existing report (PRPT) opdts.pentahoreporting.PentahoReportingOutputMeta
PostgreSQL Bulk Loader Bulk loading PGBulkLoader PostgreSQL Bulk Loader opdts.pgbulkloader.PGBulkLoaderMeta
Prioritize streams Flow PrioritizeStreams Prioritize streams in an order way. opdts.prioritizestreams.PrioritizeStreamsMeta
Process files Utility ProcessFiles Process one file per row (copy or move or delete). This step only accept filename in input. opdts.processfiles.ProcessFilesMeta
Properties Output Output PropertyOutput Write data to properties file opdts.propertyoutput.PropertyOutputMeta
Property Input Input PropertyInput Read data (key, value) from properties files. opdts.propertyinput.PropertyInputMeta
R script executor
Statistics RScriptExecutor
Executes an R script within a PDI transformation  
Regex Evaluation Scripting RegexEval Regular expression Evaluation. This step uses a regular expression to evaluate a field. It can also extract new fields out of an existing field with capturing groups. opdts.regexeval.RegexEvalMeta
Replace in string Transform ReplaceString Replace all occurences a word in a string with another word. opdts.replacestring.ReplaceStringMeta
Reservoir Sampling Statistics ReservoirSampling Transform Samples a fixed number of rows from the incoming stream opdts.reservoirsampling.ReservoirSamplingMeta
REST Client Lookup Rest Consume RESTfull services. REpresentational State Transfer (REST) is a key design idiom that embraces a stateless client-server architecture in which the web services are viewed as resources and can be identified by their URLs
Row denormaliser Transform Denormaliser Denormalises rows by looking up key-value pairs and by assigning them to new fields in the output rows. This method aggregates and needs the input rows to be sorted on the grouping fields opdts.denormaliser.DenormaliserMeta
Row flattener Transform Flattener Flattens consecutive rows based on the order in which they appear in the input stream opdts.flattener.FlattenerMeta
Row Normaliser Transform Normaliser De-normalised information can be normalised using this step type. opdts.normaliser.NormaliserMeta
RSS Input Input RssInput Read RSS feeds opdts.rssinput.RssInputMeta
RSS Output Output RssOutput Read RSS stream. opdts.rssoutput.RssOutputMeta
Rule Executor Scripting RuleExecutor Execute a rule against each row (using Drools) opdts.rules.RulesExecutorMeta
Rule Accumulator Scripting RuleAccumulator Execute a rule against a set of rows (using Drools) opdts.rules.RulesAccumulatorMeta
Run SSH commands Utility SSH Run SSH commands and returns result. opdts.ssh.SSHMeta
S3 CSV Input Input S3CSVINPUT S3 CSV Input opdts.s3csvinput.S3CsvInputMeta
S3 File Output Output S3FileOutputPlugin Exports data to a text file on an Amazon Simple Storage Service (S3)
Salesforce Delete Output SalesforceDelete Delete records in Salesforce module. opdts.salesforcedelete.SalesforceDeleteMeta
Salesforce Input Input SalesforceInput Reads information from SalesForce
Salesforce Insert Output SalesforceInsert Insert records in Salesforce module. opdts.salesforceinsert.SalesforceInsertMeta
Salesforce Update Output SalesforceUpdate Update records in Salesforce module. opdts.salesforceupdate.SalesforceUpdateMeta
Salesforce Upsert Output SalesforceUpsert Insert or update records in Salesforce module. opdts.salesforceupsert.SalesforceUpsertMeta
Sample rows Statistics SampleRows Filter rows based on the line number. opdts.samplerows.SampleRowsMeta
SAP Input Input SapInput Read data from SAP ERP, optionally with parameters opdts.sapinput.SapInputMeta
SAS Input Input SASInput This step reads files in sas7bdat (SAS) native format
Secret key generator Experimental SecretKeyGenerator Generate secrete key for algorithms such as DES, AEC, TripleDES. opdts.symmetriccrypto.secretkeygenerator.SecretKeyGeneratorMeta
Select values
Transform SelectValues Select or remove fields in a row. Optionally, set the field meta-data: type, length and precision. opdts.selectvalues.SelectValuesMeta
Send message to Syslog Utility SyslogMessage Send message to Syslog server opdts.syslog.SyslogMessageMeta
Serialize to file Output CubeOutput Write rows of data to a data cube opdts.cubeoutput.CubeOutputMeta
Set field value Transform SetValueField Replace value of a field with another value field opdts.setvaluefield.SetValueFieldMeta
Set field value to a constant Transform SetValueConstant Replace value of a field to a constant opdts.setvalueconstant.SetValueConstantMeta
Set files in result Job FilesToResult This step allows you to set filenames in the result of this transformation. Subsequent job entries can then use this information. opdts.filestoresult.FilesToResultMeta
Set Variables Job SetVariable Set environment variables based on a single input row. opdts.setvariable.SetVariableMeta
Simple Mapping
SimpleMapping Turn a repetitive, re-usable part of a transformation (a sequence of steps) into a mapping (sub-transformation). opdts.simplemapping.SimpleMapping
Single Threader Flow SingleThreader Executes a transformation snippet in a single thread. You need a standard mapping or a transformation with an Injector step where data from the parent transformation will arive in blocks. opdts.singlethreader.SingleThreaderMeta
Socket reader Inline SocketReader Socket reader. A socket client that connects to a server (Socket Writer step). opdts.socketreader.SocketReaderMeta
Socket writer Inline SocketWriter Socket writer. A socket server that can send rows of data to a socket reader. opdts.socketwriter.SocketWriterMeta
Sort rows Transform SortRows Sort rows based upon field values (ascending or descending) opdts.sort.SortRowsMeta
Sorted Merge
Joins SortedMerge Sorted Merge opdts.sortedmerge.SortedMergeMeta
Split field to rows Transform SplitFieldToRows3 Splits a single string field by delimiter and creates a new row for each split term opdts.splitfieldtorows.SplitFieldToRowsMeta
Split Fields Transform FieldSplitter When you want to split a single field into more then one, use this step type. opdts.fieldsplitter.FieldSplitterMeta
Splunk Input Transform SplunkInput Reads data from Splunk. opdts.splunk.SplunkInputMeta
Splunk Output Transform SplunkOutput Writes data to Splunk. opdts.splunk.SplunkOutputMeta
SQL File Output Output SQLFileOutput Output SQL INSERT statements to file opdts.sqlfileoutput.SQLFileOutputMeta
Stream lookup Lookup StreamLookup Look up values coming from another stream in the transformation. opdts.streamlookup.StreamLookupMeta
SSTable Output Big Data SSTableOutput writes to a filesystem directory as a Cassandra SSTable opdts.cassandrasstableoutput.SSTableOutputMeta
Streaming XML Input
String operations Transform StringOperations Apply certain operations like trimming, padding and others to string value. opdts.stringoperations.StringOperationsMeta
Strings cut Transform StringCut Strings cut (substring). opdts.stringcut.StringCutMeta
Switch / Case Flow SwitchCase Switch a row to a certain target step based on the case value in a field. opdts.switchcase.SwitchCaseMeta
Symmetric Cryptography Experimental SymmetricCryptoTrans Encrypt or decrypt a string using symmetric encryption. Available algorithms are DES, AEC, TripleDES. opdts.symmetriccrypto.symmetriccryptotrans.SymmetricCryptoTransMeta
Synchronize after merge Output SynchronizeAfterMerge This step perform insert/update/delete in one go based on the value of a field. opdts.synchronizeaftermerge.SynchronizeAfterMergeMeta
Table Agile Mart
Table Compare
TableCompare This step compares the data from two tables (provided they have the same lay-out). It'll find differences between the data in the two tables and log it. opdts.tablecompare.TableCompareMeta
Table exists Lookup TableExists Check if a table exists on a specified connection opdts.tableexists.TableExistsMeta
Table input Input TableInput Read information from a database table. opdts.tableinput.TableInputMeta
Table output
Output TableOutput Write information to a database table opdts.tableoutput.TableOutputMeta
Teradata Fastload Bulk Loader Bulk loading TeraFast The Teradata Fastload Bulk loader opdts.terafast.TeraFastMeta
Teradata TPT Insert Upsert Bulk Loader Bulk loading TeraDataBulkLoader Bulk loading via TPT using the tbuild command.  
Text file input Input TextFileInput Read data from a text file in several formats. This data can then be passed on to the next step(s)... opdts.textfileinput.TextFileInputMeta
Text file output Output TextFileOutput Write rows to a text file. opdts.textfileoutput.TextFileOutputMeta
Transformation Executor
Transformation Executor
Unique rows Transform Unique Remove double rows and leave only unique occurrences. This works only on a sorted input. If the input is not sorted, only double consecutive rows are handled correctly. opdts.uniquerows.UniqueRowsMeta
Unique rows (HashSet) Transform UniqueRowsByHashSet Remove double rows and leave only unique occurrences by using a HashSet. opdts.uniquerowsbyhashset.UniqueRowsByHashSetMeta
Univariate Statistics Statistics UnivariateStats This step computes some simple stats based on a single input field opdts.univariatestats.UnivariateStatsMeta
Update Output Update Update data in a database table based upon keys opdts.update.UpdateMeta
User Defined Java Class Scripting UserDefinedJavaClass This step allows you to program a step using Java code opdts.userdefinedjavaclass.UserDefinedJavaClassMeta
User Defined Java Expression Scripting Janino Calculate the result of a Java Expression using Janino opdts.janino.JaninoMeta
Value Mapper Transform ValueMapper Maps values of a certain field from one value to another opdts.valuemapper.ValueMapperMeta
Vertica Bulk Loader
Bulk loading VerticaBulkLoader Bulk loads data into a Vertica table using their high performance COPY feature
Web services lookup Lookup WebServiceLookup Look up information using web services (WSDL) opdts.webservices.WebServiceMeta
Knowledge Flow Data Mining KF Executes a Knowledge Flow data mining process org.pentaho.di.kf.KFMeta
Write to log
Utility WriteToLog Write data to log opdts.writetolog.WriteToLogMeta
XBase input Input XBaseInput Reads records from an XBase type of database file (DBF) opdts.xbaseinput.XBaseInputMeta
XML Input Stream (StAX) Input XMLInputStream This step is capable of processing very large and complex XML files very fast. opdts.xmlinputstream.XMLInputStreamMeta
XML Input
XML Join Joins XMLJoin Joins a stream of XML-Tags into a target XML string opdts.xmljoin.XMLJoinMeta
XML Output Output XMLOutput Write data to an XML file opdts.xmloutput.XMLOutputMeta
XSD Validator Validation XSDValidator Validate XML source (files or streams) against XML Schema Definition. opdts.xsdvalidator.XsdValidatorMeta
XSL Transformation Transform XSLT Transform XML stream using XSL (eXtensible Stylesheet Language). opdts.xslt.XsltMeta
Yaml Input Input YamlInput Read YAML source (file or stream) parse them and convert them to rows and writes these to one or more output. opdts.yamlinput.YamlInputMeta
Zip File
ZipFile Creates a standard ZIP archive from the data stream fields

This documentation is maintained by the Pentaho community, and members are encouraged to create new pages in the appropriate spaces, or edit existing pages that need to be corrected or updated.

Please do not leave comments on Wiki pages asking for help. They will be deleted. Use the forums instead.

Adaptavist Theme Builder (4.2.0) Powered by Atlassian Confluence 3.3.3, the Enterprise Wiki