Introduction
This page contains the index for the documentation on all the standard steps in Pentaho Data Integration.
We invite everyone to add more details, tips and samples to the step pages.
| Name | Category |
ID | Description | Metadata Java class opdts = org.pentaho.di.trans.steps |
|---|---|---|---|---|
| Abort | Flow | Abort | Abort a transformation | opdts.abort.AbortMeta |
| Add a checksum | Transform | CheckSum | Add a checksum column for each input row | opdts.checksum.CheckSumMeta |
| Add constants | Transform | Constant | Add one or more constants to the input rows | opdts.constant.ConstantMeta |
| Add sequence | Transform | Sequence | Get the next value from an sequence | opdts.addsequence.AddSequenceMeta |
| Add value fields changing sequence | Transform | FieldsChangeSequence | Add sequence depending of fields value change. Each time value of at least one field change, PDI will reset sequence. | opdts.fieldschangesequence.FieldsChangeSequenceMeta |
| Add XML | Transform | AddXML | Encode several fields into an XML fragment | opdts.addxml.AddXMLMeta |
| Analytic Query | Statistics | AnalyticQuery | Execute analytic queries over a sorted dataset (LEAD/LAG/FIRST/LAST) | opdts.analyticquery.AnalyticQueryMeta |
| Append streams | Flow | Append | Append 2 streams in an ordered way | opdts.append.AppendMeta |
| Automatic Documentation Output | Output | AutoDoc | This step automatically generates documentation based on input in the form of a list of transformations and jobs | opdts.autodoc.AutoDocMeta |
| Avro input | Input | AvroInput | Decode binary or Json Avro data from a file or a field | opdts.avroinput.AvroInputMeta |
| Block this step until steps finish | Flow | BlockUntilStepsFinish | Block this step until selected steps finish. | opdts.blockuntilstepsfinish.BlockUntilStepsFinishMeta |
| Blocking Step | Flow | BlockingStep | This step blocks until all incoming rows have been processed. Subsequent steps only recieve the last input row to this step. | opdts.blockingstep.BlockingStepMeta |
| Calculator | Transform | Calculator | Create new fields by performing simple calculations | opdts.calculator.CalculatorMeta |
| Call DB Procedure | Lookup | DBProc | Get back information by calling a database procedure. | opdts.dbproc.DBProcMeta |
| Change file encoding | Utility | ChangeFileEncoding | Change file encoding and create a new file | opdts.changefileencoding.ChangeFileEncodingMeta |
| Cassandra input | Input | CassandraInput | Read from a Cassandra column family | opdts.cassandrainput.CassandraInputMeta |
| Cassandra output | Output | CassandraOutput | Write to a Cassandra column family | opdts.cassandraoutput.CassandraOutputMeta |
| Check if a column exists | Lookup | ColumnExists | Check if a column exists in a table on a specified connection. | opdts.columnexists.ColumnExistsMeta |
| Check if file is locked | Lookup | FileLocked | Check if a file is locked by another process | opdts.filelocked.FileLockedMeta |
| Check if webservice is available | Lookup | WebServiceAvailable | Check if a webservice is available | opdts.webserviceavailable.WebServiceAvailableMeta |
| Clone row | Utility | CloneRow | Clone a row as many times as needed | opdts.clonerow.CloneRowMeta |
| Closure Generator | Transform | ClosureGenerator | This step allows you to generates a closure table using parent-child relationships. | opdts.closure.ClosureGeneratorMeta |
| Combination lookup/update | Data Warehouse | CombinationLookup | Update a junk dimension in a data warehouse. Alternatively, look up information in this dimension. The primary key of a junk dimension are all the fields. | opdts.combinationlookup.CombinationLookupMeta |
| Copy rows to result | Job | RowsToResult | Use this step to write rows to the executing job. The information will then be passed to the next entry in this job. | opdts.rowstoresult.RowsToResultMeta |
| Credit card validator | Validation | CreditCardValidator | The Credit card validator step will help you tell: (1) if a credit card number is valid (uses LUHN10 (MOD-10) algorithm) (2) which credit card vendor handles that number (VISA, MasterCard, Diners Club, EnRoute, American Express (AMEX),...) | opdts.creditcardvalidator.CreditCardValidatorMeta |
| CSV file input | Input | CsvInput | Simple CSV file input | opdts.csvinput.CsvInputMeta |
| Data Grid | Input | DataGrid | Enter rows of static data in a grid, usually for testing, reference or demo purpose | opdts.datagrid.DataGridMeta |
| Data Validator | Validation | Validator | Validates passing data based on a set of rules | opdts.validator.ValidatorMeta |
| Database join | Lookup | DBJoin | Execute a database query using stream values as parameters | opdts.databasejoin.DatabaseJoinMeta |
| Database lookup | Lookup | DBLookup | Look up values in a database using field values | opdts.databaselookup.DatabaseLookupMeta |
| De-serialize from file | Input | CubeInput | Read rows of data from a data cube. | opdts.cubeinput.CubeInputMeta |
| Delay row | Utility | Delay | Output each input row after a delay | opdts.delay.DelayMeta |
| Delete | Output | Delete | Delete data in a database table based upon keys | opdts.delete.DeleteMeta |
| Detect empty stream | Flow | DetectEmptyStream | This step will output one empty row if input stream is empty (ie when input stream does not contain any row) | opdts.detectemptystream.DetectEmptyStreamMeta |
| Dimension lookup/update | Data Warehouse | DimensionLookup | Update a slowly changing dimension in a data warehouse. Alternatively, look up information in this dimension. | opdts.dimensionlookup.DimensionLookupMeta |
| Dummy (do nothing) | Flow | Dummy | This step type doesn't do anything. It's useful however when testing things or in certain situations where you want to split streams. | opdts.dummytrans.DummyTransMeta |
| Dynamic SQL row | Lookup | DynamicSQLRow | Execute dynamic SQL statement build in a previous field | opdts.dynamicsqlrow.DynamicSQLRowMeta |
| Edi to XML |
Utility | TypeExitEdi2XmlStep | Converts an Edifact message to XML to simplify data extraction (Available in PDI 4.4, already present in CI trunk builds) | opdts.edi2xml.Edi2XmlMeta |
| ElasticSearch Bulk Insert | Bulk loading | ElasticSearchBulk | Performs bulk inserts into ElasticSearch | opdts.elasticsearchbulk.ElasticSearchBulkMeta |
| Email messages input | Input | MailInput | Read POP3/IMAP server and retrieve messages | opdts.mailinput.MailInputMeta |
| ESRI Shapefile Reader | Input | ShapeFileReader | Reads shape file data from an ESRI shape file and linked DBF file | org.pentaho.di.shapefilereader.ShapeFileReaderMeta |
| ETL Metadata Injection | Flow | MetaInject | This step allows you to inject metadata into an existing transformation prior to execution. This allows for the creation of dynamic and highly flexible data integration solutions. | opdts.metainject.MetaInjectMeta |
| Example plugin | Transform | DummyPlugin | This is an example for a plugin test step | be.ibridge.kettle.dummy.DummyPluginMeta |
| Execute a process | Utility | ExecProcess | Execute a process and return the result | opdts.execprocess.ExecProcessMeta |
| Execute row SQL script | Scripting | ExecSQLRow | Execute SQL script extracted from a field created in a previous step. | opdts.execsqlrow.ExecSQLRowMeta |
| Execute SQL script | Scripting | ExecSQL | Execute an SQL script, optionally parameterized using input rows | opdts.sql.ExecSQLMeta |
| File exists | Lookup | FileExists | Check if a file exists | opdts.fileexists.FileExistsMeta |
| Filter rows | Flow | FilterRows | Filter rows using simple equations | opdts.filterrows.FilterRowsMeta |
| Fixed file input | Input | FixedInput | Fixed file input | opdts.fixedinput.FixedInputMeta |
| Formula | Scripting | Formula | Calculate a formula using Pentaho's libformula | opdts.formula.FormulaMeta |
| Fuzzy match | Lookup | FuzzyMatch | Finding approximate matches to a string using matching algorithms. Read a field from a main stream and output approximative value from lookup stream. | opdts.fuzzymatch.FuzzyMatchMeta |
| Generate random credit card numbers | Input | RandomCCNumberGenerator | Generate random valide (luhn check) credit card numbers | opdts.randomccnumber.RandomCCNumberGeneratorMeta |
| Generate random value | Input | RandomValue | Generate random value | opdts.randomvalue.RandomValueMeta |
| Generate Rows | Input | RowGenerator | Generate a number of empty or equal rows. | opdts.rowgenerator.RowGeneratorMeta |
| Get data from XML | Input | getXMLData | Get data from XML file by using XPath. This step also allows you to parse XML defined in a previous field. | opdts.getxmldata.GetXMLDataMeta |
| Get File Names | Input | GetFileNames | Get file names from the operating system and send them to the next step. | opdts.getfilenames.GetFileNamesMeta |
| Get files from result | Job | FilesFromResult | This step allows you to read filenames used or generated in a previous entry in a job. | opdts.filesfromresult.FilesFromResultMeta |
| Get Files Rows Count | Input | GetFilesRowsCount | Get Files Rows Count | opdts.getfilesrowscount.GetFilesRowsCountMeta |
| Get ID from slave server | Transform | GetSlaveSequence | Retrieves unique IDs in blocks from a slave server. The referenced sequence needs to be configured on the slave server in the XML configuration file. | opdts.getslavesequence.GetSlaveSequenceMeta |
| Get repository names | Input | GetRepositoryNames | Lists detailed information about transformations and/or jobs in a repository | opdts.getrepositorynames.GetRepositoryNamesMeta |
| Get rows from result | Job | RowsFromResult | This allows you to read rows from a previous entry in a job. | opdts.rowsfromresult.RowsFromResultMeta |
| Get SubFolder names | Input | GetSubFolders | Read a parent folder and return all subfolders | opdts.getsubfolders.GetSubFoldersMeta |
| Get System Info | Input | SystemInfo | Get information from the system like system date, arguments, etc. | opdts.systemdata.SystemDataMeta |
| Get table names | Input | GetTableNames | Get table names from database connection and send them to the next step | opdts.gettablenames.GetTableNamesMeta |
| Get Variables | Job | GetVariable | Determine the values of certain (environment or Kettle) variables and put them in field values. | opdts.getvariable.GetVariableMeta |
| Google Analytics | Input | TypeExitGoogleAnalyticsInputStep | Fetches data from google analytics account | opdts.googleanalytics.GaInputStepMeta |
| Greenplum Bulk Loader | Bulk loading | GPBulkLoader | Greenplum Bulk Loader | opdts.gpbulkloader.GPBulkLoaderMeta |
| Group by | Statistics | GroupBy | Builds aggregates in a group by fashion. This works only on a sorted input. If the input is not sorted, only double consecutive rows are handled correctly. | opdts.groupby.GroupByMeta |
| GZIP CSV Input | Input | ParallelGzipCsvInput | Parallel GZIP CSV file input reader | opdts.parallelgzipcsv.ParGzipCsvInputMeta |
| HBase input | Input | HbaseInput | Read from an HBase column family | opdts.hbaseinput.HBaseInputMeta |
| HBase output | Output | HbaseOutput | Write to an HBase column family | opdts.hbaseoutput.HBaseOutputMeta |
| HTTP client | Lookup | HTTP | Call a web service over HTTP by supplying a base URL by allowing parameters to be set dynamically | opdts.http.HTTPMeta |
| HTTP Post | Lookup | HTTPPOST | Call a web service request over HTTP by supplying a base URL by allowing parameters to be set dynamically | opdts.httppost.HTTPPOSTMeta |
| Identify last row in a stream | Flow | DetectLastRow | Last row will be marked | opdts.detectlastrow.DetectLastRowMeta |
| If field value is null | Utility | IfNull | Sets a field value to a constant if it is null. | opdts.ifnull.IfNullMeta |
| Infobright Loader | Bulk loading | InfobrightOutput | Load data to an Infobright database table | opdts.infobrightoutput.InfobrightLoaderMeta |
| Ingres VectorWise Bulk Loader | Bulk loading | VectorWiseBulkLoader | This step interfaces with the Ingres VectorWise Bulk Loader "COPY TABLE" command. | opdts.ivwloader.IngresVectorwiseLoaderMeta |
| Injector | Inline | Injector | Injector step to allow to inject rows into the transformation through the java API | opdts.injector.InjectorMeta |
| Insert / Update | Output | InsertUpdate | Update or insert rows in a database based upon keys. | opdts.insertupdate.InsertUpdateMeta |
| Java Filter | Flow | JavaFilter | Filter rows using java code | opdts.javafilter.JavaFilterMeta |
| Job Executor |
Flow |
JobExecutor |
This step executes a Pentaho Data Integration Job, passes parameters and rows. |
opdts.jobexecutor.JobExecutorMeta |
| Join Rows (cartesian product) |
Joins | JoinRows | The output of this step is the cartesian product of the input streams. The number of rows is the multiplication of the number of rows in the input streams. | opdts.joinrows.JoinRowsMeta |
| Json Input | Input | JsonInput | Extract relevant portions out of JSON structures (file or incoming field) and output rows | opdts.jsoninput.JsonInputMeta |
| Json output | Output | JsonOutput | Create Json bloc and output it in a field ou a file. | opdts.jsonoutput.JsonOutputMeta |
| LDAP Input | Input | LDAPInput | Read data from LDAP host | opdts.ldapinput.LDAPInputMeta |
| LDAP Output | Output | LDAPOutput | Perform Insert, upsert, update, add or delete operations on records based on their DN (Distinguished Name). | opdts.ldapoutput.LDAPOutputMeta |
| LDIF Input | Input | LDIFInput | Read data from LDIF files | opdts.ldifinput.LDIFInputMeta |
| Load file content in memory | Input | LoadFileInput | Load file content in memory | opdts.loadfileinput.LoadFileInputMeta |
| LucidDB Streaming Loader | Bulk loading | LucidDBStreamingLoader | Load data into LucidDB by using Remote Rows UDX. | opdts.luciddbstreamingloader.LucidDBStreamingLoaderMeta |
| Utility | Send eMail. | opdts.mail.MailMeta | ||
| Mail Validator | Validation | MailValidator | Check if an email address is valid. | opdts.mailvalidator.MailValidatorMeta |
| Mapping (sub-transformation) | Mapping | Mapping | Run a mapping (sub-transformation), use MappingInput and MappingOutput to specify the fields interface | opdts.mapping.MappingMeta |
| Mapping input specification | Mapping | MappingInput | Specify the input interface of a mapping | opdts.mappinginput.MappingInputMeta |
| Mapping output specification | Mapping | MappingOutput | Specify the output interface of a mapping | opdts.mappingoutput.MappingOutputMeta |
| MaxMind GeoIP Lookup | Lookup | MaxMindGeoIPLookup | Lookup an IPv4 address in a MaxMind database and add fields such as geography, ISP, or organization. | com.maxmind.geoip.MaxMindGeoIPLookupMeta |
| Memory Group by | Statistics | MemoryGroupBy | Builds aggregates in a group by fashion. This step doesn't require sorted input. | opdts.memgroupby.MemoryGroupByMeta |
| Merge Join | Joins | MergeJoin | Joins two streams on a given key and outputs a joined set. The input streams must be sorted on the join key | opdts.mergejoin.MergeJoinMeta |
| Merge Rows (diff) | Joins | MergeRows | Merge two streams of rows, sorted on a certain key. The two streams are compared and the equals, changed, deleted and new rows are flagged. | opdts.mergerows.MergeRowsMeta |
| Metadata structure of stream | Utility | StepMetastructure | This is a step to read the metadata of the incoming stream. | opdts.stepmeta.StepMetastructureMeta |
| Microsoft Access Input |
Input | AccessInput | Read data from a Microsoft Access file | opdts.accessinput.AccessInputMeta |
| Microsoft Access Output | Output | AccessOutput | Stores records into an MS-Access database table. | opdts.accessoutput.AccessOutputMeta |
| Microsoft Excel Input | Input | ExcelInput | Read data from Excel and OpenOffice Workbooks (XLS, XLSX, ODS). | opdts.excelinput.ExcelInputMeta |
| Microsoft Excel Output | Output | ExcelOutput | Stores records into an Excel (XLS) document with formatting information. | opdts.exceloutput.ExcelOutputMeta |
| Microsoft Excel Writer | Output | TypeExitExcelWriterStep | Writes or appends data to an Excel file | opdts.excelwriter.ExcelWriterStepMeta |
| Modified Java Script Value | Scripting | ScriptValueMod | This steps allows the execution of JavaScript programs (and much more) |
opdts.scriptvalues_mod.ScriptValuesMetaMod |
| Mondrian Input |
Input | MondrianInput | Execute and retrieve data using an MDX query against a Pentaho Analyses OLAP server (Mondrian) | opdts.mondrianinput.MondrianInputMeta |
| MonetDB Bulk Loader | Bulk loading | MonetDBBulkLoader | Load data into MonetDB by using their bulk load command in streaming mode. | opdts.monetdbbulkloader.MonetDBBulkLoaderMeta |
| MongoDB Input | Input | MongoDbInput | Reads all entries from a MongoDB collection in the specified database. | opdts.mongodbinput.MongoDbInputMeta |
| MongoDb output | Output | MongoDbOutput | Write to a MongoDB collection. | opdts.mongodboutput.MongoDbOutputMeta |
| Multiway Merge Join | Experimental | MultiwayMergeJoin | Multiway Merge Join | opdts.multimerge.MultiMergeJoinMeta |
| MySQL Bulk Loader | Bulk loading | MySQLBulkLoader | MySQL bulk loader step, loading data over a named pipe (not available on MS Windows) | opdts.mysqlbulkloader.MySQLBulkLoaderMeta |
| Null if... | Utility | NullIf | Sets a field value to null if it is equal to a constant value | opdts.nullif.NullIfMeta |
| Number range | Transform | NumberRange | Create ranges based on numeric field | opdts.numberrange.NumberRangeMeta |
| OLAP Input | Input | OlapInput | Execute and retrieve data using an MDX query against any XML/A OLAP datasource using olap4j | opdts.olapinput.OlapInputMeta |
| OpenERP Object Delete |
Delete |
OpenERPObjectDelete | Deletes data from the OpenERP server using the XMLRPC interface with the 'unlink' function. |
opdts.openerp.objectdelete.OpenERPObjectDeleteMeta |
| OpenERP Object Input |
Input | OpenERPObjectInput | Retrieves data from the OpenERP server using the XMLRPC interface with the 'read' function. | opdts.openerp.objectinput.OpenERPObjectInputMeta |
| OpenERP Object Output |
Output |
OpenERPObjectOutputImport | Updates data on the OpenERP server using the XMLRPC interface and the 'import' function |
opdts.openerp.objectoutput.OpenERPObjectOutputMeta |
| Oracle Bulk Loader | Bulk loading | OraBulkLoader | Use Oracle Bulk Loader to load data | opdts.orabulkloader.OraBulkLoaderMeta |
| Output steps metrics | Statistics | StepsMetrics | Return metrics for one or several steps | opdts.stepsmetrics.StepsMetricsMeta |
| Palo Cell Input | Input |
PaloCellInput | Retrieves all cell data from a Palo cube |
opdts.palo.cellinput |
| Palo Cell Output |
Output |
PaloCellOutput | Updates cell data in a Palo cube |
opdts.palo.celloutput |
| Palo Dimension Input |
Input |
PaloDimInput | Returns elements from a dimension in a Palo database |
opdts.palo.diminput |
| Palo Dimension Output |
Output |
PaloDimOutput | Creates/updates dimension elements and element consolidations in a Palo database |
opdts.palo.dimoutput |
| Pentaho Reporting Output | Output | PentahoReportingOutput | Executes an existing report (PRPT) | opdts.pentahoreporting.PentahoReportingOutputMeta |
| PostgreSQL Bulk Loader | Bulk loading | PGBulkLoader | PostgreSQL Bulk Loader | opdts.pgbulkloader.PGBulkLoaderMeta |
| Prioritize streams | Flow | PrioritizeStreams | Prioritize streams in an order way. | opdts.prioritizestreams.PrioritizeStreamsMeta |
| Process files | Utility | ProcessFiles | Process one file per row (copy or move or delete). This step only accept filename in input. | opdts.processfiles.ProcessFilesMeta |
| Properties Output | Output | PropertyOutput | Write data to properties file | opdts.propertyoutput.PropertyOutputMeta |
| Property Input | Input | PropertyInput | Read data (key, value) from properties files. | opdts.propertyinput.PropertyInputMeta |
| Regex Evaluation | Scripting | RegexEval | Regular expression Evaluation. This step uses a regular expression to evaluate a field. It can also extract new fields out of an existing field with capturing groups. | opdts.regexeval.RegexEvalMeta |
| Replace in string | Transform | ReplaceString | Replace all occurences a word in a string with another word. | opdts.replacestring.ReplaceStringMeta |
| Reservoir Sampling | Statistics | ReservoirSampling | Transform Samples a fixed number of rows from the incoming stream | opdts.reservoirsampling.ReservoirSamplingMeta |
| REST Client | Lookup | Rest | Consume RESTfull services. REpresentational State Transfer (REST) is a key design idiom that embraces a stateless client-server architecture in which the web services are viewed as resources and can be identified by their URLs | opdts.rest.RestMeta |
| Row denormaliser | Transform | Denormaliser | Denormalises rows by looking up key-value pairs and by assigning them to new fields in the output rows. This method aggregates and needs the input rows to be sorted on the grouping fields | opdts.denormaliser.DenormaliserMeta |
| Row flattener | Transform | Flattener | Flattens consecutive rows based on the order in which they appear in the input stream | opdts.flattener.FlattenerMeta |
| Row Normaliser | Transform | Normaliser | De-normalised information can be normalised using this step type. | opdts.normaliser.NormaliserMeta |
| RSS Input | Input | RssInput | Read RSS feeds | opdts.rssinput.RssInputMeta |
| RSS Output | Output | RssOutput | Read RSS stream. | opdts.rssoutput.RssOutputMeta |
| Rule Executor | Experimental | RuleExecutor | Execute a rule against each row | opdts.rules.RulesExecutorMeta |
| Rule Accumulator | Experimental | RuleAccumulator | Execute a rule against a set of rows | opdts.rules.RulesAccumulatorMeta |
| Run SSH commands | Utility | SSH | Run SSH commands and returns result. | opdts.ssh.SSHMeta |
| S3 CSV Input | Input | S3CSVINPUT | S3 CSV Input | opdts.s3csvinput.S3CsvInputMeta |
| S3 File Output | Output | S3FileOutputPlugin | Create files in an S3 location | com.pentaho.amazon.s3.S3FileOutputMeta |
| Salesforce Delete | Output | SalesforceDelete | Delete records in Salesforce module. | opdts.salesforcedelete.SalesforceDeleteMeta |
| Salesforce Input | Input | SalesforceInput | Reads information from SalesForce |
opdts.salesforceinput.SalesforceInputMeta |
| Salesforce Insert | Output | SalesforceInsert | Insert records in Salesforce module. | opdts.salesforceinsert.SalesforceInsertMeta |
| Salesforce Update | Output | SalesforceUpdate | Update records in Salesforce module. | opdts.salesforceupdate.SalesforceUpdateMeta |
| Salesforce Upsert | Output | SalesforceUpsert | Insert or update records in Salesforce module. | opdts.salesforceupsert.SalesforceUpsertMeta |
| Sample rows | Statistics | SampleRows | Filter rows based on the line number. | opdts.samplerows.SampleRowsMeta |
| SAP Input | Input | SapInput | Read data from SAP ERP, optionally with parameters | opdts.sapinput.SapInputMeta |
| SAS Input | Input | SASInput | This step reads files in sas7bdat (SAS) native format |
opdts.sasinput.SasInputMeta |
| Secret key generator | Experimental | SecretKeyGenerator | Generate secrete key for algorithms such as DES, AEC, TripleDES. | opdts.symmetriccrypto.secretkeygenerator.SecretKeyGeneratorMeta |
| Select values |
Transform | SelectValues | Select or remove fields in a row. Optionally, set the field meta-data: type, length and precision. | opdts.selectvalues.SelectValuesMeta |
| Send message to Syslog | Utility | SyslogMessage | Send message to Syslog server | opdts.syslog.SyslogMessageMeta |
| Serialize to file | Output | CubeOutput | Write rows of data to a data cube | opdts.cubeoutput.CubeOutputMeta |
| Set field value | Transform | SetValueField | Set value of a field with another value field | opdts.setvaluefield.SetValueFieldMeta |
| Set field value to a constant | Transform | SetValueConstant | Set value of a field to a constant | opdts.setvalueconstant.SetValueConstantMeta |
| Set files in result | Job | FilesToResult | This step allows you to set filenames in the result of this transformation. Subsequent job entries can then use this information. | opdts.filestoresult.FilesToResultMeta |
| Set Variables | Job | SetVariable | Set environment variables based on a single input row. | opdts.setvariable.SetVariableMeta |
| Single Threader | Flow | SingleThreader | Executes a transformation snippet in a single thread. You need a standard mapping or a transformation with an Injector step where data from the parent transformation will arive in blocks. | opdts.singlethreader.SingleThreaderMeta |
| Socket reader | Inline | SocketReader | Socket reader. A socket client that connects to a server (Socket Writer step). | opdts.socketreader.SocketReaderMeta |
| Socket writer | Inline | SocketWriter | Socket writer. A socket server that can send rows of data to a socket reader. | opdts.socketwriter.SocketWriterMeta |
| Sort rows | Transform | SortRows | Sort rows based upon field values (ascending or descending) | opdts.sort.SortRowsMeta |
| Sorted Merge |
Joins | SortedMerge | Sorted Merge | opdts.sortedmerge.SortedMergeMeta |
| Split field to rows | Transform | SplitFieldToRows3 | Splits a single string field by delimiter and creates a new row for each split term | opdts.splitfieldtorows.SplitFieldToRowsMeta |
| Split Fields | Transform | FieldSplitter | When you want to split a single field into more then one, use this step type. | opdts.fieldsplitter.FieldSplitterMeta |
| SQL File Output | Output | SQLFileOutput | Output SQL INSERT statements to file | opdts.sqlfileoutput.SQLFileOutputMeta |
| Stream lookup | Lookup | StreamLookup | Look up values coming from another stream in the transformation. | opdts.streamlookup.StreamLookupMeta |
| String operations | Transform | StringOperations | Apply certain operations like trimming, padding and others to string value. | opdts.stringoperations.StringOperationsMeta |
| Strings cut | Transform | StringCut | Strings cut (substring). | opdts.stringcut.StringCutMeta |
| Switch / Case | Flow | SwitchCase | Switch a row to a certain target step based on the case value in a field. | opdts.switchcase.SwitchCaseMeta |
| Symmetric Cryptography | Experimental | SymmetricCryptoTrans | Encrypt or decrypt a string using symmetric encryption. Available algorithms are DES, AEC, TripleDES. | opdts.symmetriccrypto.symmetriccryptotrans.SymmetricCryptoTransMeta |
| Synchronize after merge | Output | SynchronizeAfterMerge | This step perform insert/update/delete in one go based on the value of a field. | opdts.synchronizeaftermerge.SynchronizeAfterMergeMeta |
| Table exists | Lookup | TableExists | Check if a table exists on a specified connection | opdts.tableexists.TableExistsMeta |
| Table input | Input | TableInput | Read information from a database table. | opdts.tableinput.TableInputMeta |
| Table output |
Output | TableOutput | Write information to a database table | opdts.tableoutput.TableOutputMeta |
| Teradata Fastload Bulk Loader | Bulk loading | TeraFast | The Teradata Fastload Bulk loader | opdts.terafast.TeraFastMeta |
| Text file input | Input | TextFileInput | Read data from a text file in several formats. This data can then be passed on to the next step(s)... | opdts.textfileinput.TextFileInputMeta |
| Text file output | Output | TextFileOutput | Write rows to a text file. | opdts.textfileoutput.TextFileOutputMeta |
| Unique rows | Transform | Unique | Remove double rows and leave only unique occurrences. This works only on a sorted input. If the input is not sorted, only double consecutive rows are handled correctly. | opdts.uniquerows.UniqueRowsMeta |
| Unique rows (HashSet) | Transform | UniqueRowsByHashSet | Remove double rows and leave only unique occurrences by using a HashSet. | opdts.uniquerowsbyhashset.UniqueRowsByHashSetMeta |
| Univariate Statistics | Statistics | UnivariateStats | This step computes some simple stats based on a single input field | opdts.univariatestats.UnivariateStatsMeta |
| Update | Output | Update | Update data in a database table based upon keys | opdts.update.UpdateMeta |
| User Defined Java Class | Scripting | UserDefinedJavaClass | This step allows you to program a step using Java code | opdts.userdefinedjavaclass.UserDefinedJavaClassMeta |
| User Defined Java Expression | Scripting | Janino | Calculate the result of a Java Expression using Janino | opdts.janino.JaninoMeta |
| Value Mapper | Transform | ValueMapper | Maps values of a certain field from one value to another | opdts.valuemapper.ValueMapperMeta |
| Web services lookup | Lookup | WebServiceLookup | Look up information using web services (WSDL) | opdts.webservices.WebServiceMeta |
| Write to log |
Utility | WriteToLog | Write data to log | opdts.writetolog.WriteToLogMeta |
| XBase input | Input | XBaseInput | Reads records from an XBase type of database file (DBF) | opdts.xbaseinput.XBaseInputMeta |
| XML Input Stream (StAX) | Input | XMLInputStream | This step is capable of processing very large and complex XML files very fast. | opdts.xmlinputstream.XMLInputStreamMeta |
| XML Join | Joins | XMLJoin | Joins a stream of XML-Tags into a target XML string | opdts.xmljoin.XMLJoinMeta |
| XML Output | Output | XMLOutput | Write data to an XML file | opdts.xmloutput.XMLOutputMeta |
| XSD Validator | Validation | XSDValidator | Validate XML source (files or streams) against XML Schema Definition. | opdts.xsdvalidator.XsdValidatorMeta |
| XSL Transformation | Transform | XSLT | Transform XML stream using XSL (eXtensible Stylesheet Language). | opdts.xslt.XsltMeta |
| Yaml Input | Input | YamlInput | Read YAML source (file or stream) parse them and convert them to rows and writes these to one or more output. | opdts.yamlinput.YamlInputMeta |
Page:
Abort
Page: Access Input
Page: Access Output
Page: Add a checksum
Page: Add Constants
Page: Add sequence
Page: Add value fields changing sequence
Page: Add XML
Page: Aggregate Rows
Page: Analytic Query
Page: Append
Page: Append streams
Page: Automatic Documentation Output
Page: Blocking step
Page: Block this step until steps finish
Page: Calculator
Page: Call DB Procedure
Page: Cassandra Output
Page: Change file encoding
Page: Check if a column exists
Page: Check if file is locked
Page: Check if webservice is available
Page: Clone row
Page: Closure Generator
Page: Combination lookup-update
Page: Copy rows to result
Page: Credit card validator
Page: CSV Input
Page: Database Join
Page: Database lookup
Page: Data Grid
Page: Data Validator
Page: Delay row
Page: Delete
Page: De-serialize from file
Page: Detect empty stream
Page: Dimension Lookup-Update
Page: Dummy (do nothing)
Page: Edi to XML
Page: ETL Metadata Injection
Page: Excel Input (XLS, XLSX) including OpenOffice Workbooks (ODS)
Page: Excel Input Step
Page: Excel Output
Page: Execute SQL script
Page: Field Splitter
Page: File exists
Page: Filter rows
Page: Fixed File Input
Page: Flattener
Page: Generate random credit card numbers
Page: Generate Random Value
Page: Generate Rows
Page: Get Data From XML
Page: Get File Names
Page: Get files from result
Page: Get Files Rows Count
Page: Get ID from Slave Server
Page: Get repository names
Page: Get rows from result
Page: Get System Info
Page: Get Variable
Page: Greenplum Load
Page: Group By
Page: HBase Input
Page: HBase Output
Page: HL7 Input
Page: HTTP Client
Page: Infobright Loader
Page: Ingres VectorWise Bulk Loader
Page: Injector
Page: Insert - Update
Page: JavaScript Values
Page: Job Executor
Page: Join Rows (Cartesian product)
Page: LDAP Input
Page: LDIF Input
Page: Mail_transformation
Page: Mail Validator
Page: Mapping
Page: Mapping Input
Page: Mapping Output
Page: Merge Join
Page: Merge rows
Page: Metadata Structure
Page: Metadata Structure of Stream
Page: Microsoft Excel Writer
Page: Modified Java Script Value
Page: Mondrian Input
Page: MongoDB Input
Page: MongoDb output
Page: Null If
Page: OLAP Input
Page: OpenERP Object Delete
Page: OpenERP Object Input
Page: OpenERP Object Output
Page: Oracle Bulk Loader
Page: Palo Cell Input
Page: Palo Cell Output
Page: Palo Dimension Input
Page: Palo Dimension Output
Page: Pentaho Reporting Output
Page: PostgreSQL Bulk Loader
Page: Property Input
Page: Property Output
Page: Regex Evaluation
Page: Row De-normalizer
Page: Row Normalizer
Page: RSS Input
Page: Rule Accumulator
Page: Rule Executor
Page: Run SSH commands
Page: SalesForce Input
Page: Sample rows
Page: SAP Input
Page: SAS Input
Page: Select Values
Page: Serialize to file
Page: Set files in result
Page: Set Variable
Page: Single Threader
Page: Socket reader
Page: Socket writer
Page: Sorted Merge
Page: Sort rows
Page: Split field to rows
Page: SQL File Output
Page: Streaming XML Input
Page: Stream Lookup
Page: Switch-Case
Page: Table Exists
Page: Table Input
Page: Table Output
Page: Teradata Fastload Bulk Loader
Page: Text File Input
Page: Text File Output
Page: Unique Rows
Page: Update
Page: Value Mapper
Page: Web services lookup
Page: XBase Input
Page: XML Add
Page: XML Input
Page: XML Input Stream (StAX)
Page: XML Join
Page: XML Output
Page: XSD Validator
Page: XSL Transformation
Page: Access Input
Page: Access Output
Page: Add a checksum
Page: Add Constants
Page: Add sequence
Page: Add value fields changing sequence
Page: Add XML
Page: Aggregate Rows
Page: Analytic Query
Page: Append
Page: Append streams
Page: Automatic Documentation Output
Page: Blocking step
Page: Block this step until steps finish
Page: Calculator
Page: Call DB Procedure
Page: Cassandra Output
Page: Change file encoding
Page: Check if a column exists
Page: Check if file is locked
Page: Check if webservice is available
Page: Clone row
Page: Closure Generator
Page: Combination lookup-update
Page: Copy rows to result
Page: Credit card validator
Page: CSV Input
Page: Database Join
Page: Database lookup
Page: Data Grid
Page: Data Validator
Page: Delay row
Page: Delete
Page: De-serialize from file
Page: Detect empty stream
Page: Dimension Lookup-Update
Page: Dummy (do nothing)
Page: Edi to XML
Page: ETL Metadata Injection
Page: Excel Input (XLS, XLSX) including OpenOffice Workbooks (ODS)
Page: Excel Input Step
Page: Excel Output
Page: Execute SQL script
Page: Field Splitter
Page: File exists
Page: Filter rows
Page: Fixed File Input
Page: Flattener
Page: Generate random credit card numbers
Page: Generate Random Value
Page: Generate Rows
Page: Get Data From XML
Page: Get File Names
Page: Get files from result
Page: Get Files Rows Count
Page: Get ID from Slave Server
Page: Get repository names
Page: Get rows from result
Page: Get System Info
Page: Get Variable
Page: Greenplum Load
Page: Group By
Page: HBase Input
Page: HBase Output
Page: HL7 Input
Page: HTTP Client
Page: Infobright Loader
Page: Ingres VectorWise Bulk Loader
Page: Injector
Page: Insert - Update
Page: JavaScript Values
Page: Job Executor
Page: Join Rows (Cartesian product)
Page: LDAP Input
Page: LDIF Input
Page: Mail_transformation
Page: Mail Validator
Page: Mapping
Page: Mapping Input
Page: Mapping Output
Page: Merge Join
Page: Merge rows
Page: Metadata Structure
Page: Metadata Structure of Stream
Page: Microsoft Excel Writer
Page: Modified Java Script Value
Page: Mondrian Input
Page: MongoDB Input
Page: MongoDb output
Page: Null If
Page: OLAP Input
Page: OpenERP Object Delete
Page: OpenERP Object Input
Page: OpenERP Object Output
Page: Oracle Bulk Loader
Page: Palo Cell Input
Page: Palo Cell Output
Page: Palo Dimension Input
Page: Palo Dimension Output
Page: Pentaho Reporting Output
Page: PostgreSQL Bulk Loader
Page: Property Input
Page: Property Output
Page: Regex Evaluation
Page: Row De-normalizer
Page: Row Normalizer
Page: RSS Input
Page: Rule Accumulator
Page: Rule Executor
Page: Run SSH commands
Page: SalesForce Input
Page: Sample rows
Page: SAP Input
Page: SAS Input
Page: Select Values
Page: Serialize to file
Page: Set files in result
Page: Set Variable
Page: Single Threader
Page: Socket reader
Page: Socket writer
Page: Sorted Merge
Page: Sort rows
Page: Split field to rows
Page: SQL File Output
Page: Streaming XML Input
Page: Stream Lookup
Page: Switch-Case
Page: Table Exists
Page: Table Input
Page: Table Output
Page: Teradata Fastload Bulk Loader
Page: Text File Input
Page: Text File Output
Page: Unique Rows
Page: Update
Page: Value Mapper
Page: Web services lookup
Page: XBase Input
Page: XML Add
Page: XML Input
Page: XML Input Stream (StAX)
Page: XML Join
Page: XML Output
Page: XSD Validator
Page: XSL Transformation