Hitachi Vantara Pentaho Community Wiki
Skip to end of metadata
Go to start of metadata

Description

The Merge Join step performs a classic merge join between data sets with data coming from two different input steps. Join options include INNER, LEFT OUTER, RIGHT OUTER, and FULL OUTER.

Note: In this step rows are expected in to be sorted on the specified key fields. When using the Sort step, this works fine. When you sorted the data outside of PDI, you may run into issues with the internal case sensitive/insensitive flag. Further information can be found on PDI-11440.

Note:  If the key fields have the same name (e.g. ID), a second key field called $key_1 (e.g. ID_1) will be created in the result.

Options

The following table describes the options available for the Merge Join step:

Option

Description

Step name

Name of the step; this name has to be unique in a single transformation

First Step

Specify the first input step to the merge join (left)

Second Step

Specify the second input step to the merge join (right)

Join Type

Select from the available types of joins:

  • FULL OUTER: all rows from both sources will be included in the result, with empty values for non-matching keys in both data streams
  • LEFT OUTER: all rows from the first source will be in the result, with empty values for non-matching keys in the second data stream
  • RIGHT OUTER: all rows from the second source will be in the result, with empty values for non-matching keys in the first data stream
  • INNER JOIN: only rows having the same key in both sources will be included in the result

Keys for 1st step

Specify the key fields on which the incoming data is sorted; click Get key fields to retrieve a list of fields from the specified step.

Keys for 2nd step

Specify the key fields on which the incoming data is sorted; click Get key fields to retrieve a list of fields from the specified step.

To find out how to partition this step see this blog from Dan Keeley

Metadata Injection Support (7.x and later)

All fields of this step support metadata injection. You can use this step with ETL Metadata Injection to pass metadata to your transformation at runtime.

  • No labels