Hitachi Vantara Pentaho Community Wiki

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

Description

This step allows you to peek forward and backwards across rows. Examples of common use cases are:

  • Calculate the "time between orders" by ordering rows by order date, and LAGing 1 row back to get previous order time.
  • Calculate the "duration" of a web page view by LEADing 1 row ahead and determining how many seconds the user was on this page.

Options

Option

Description

Step name

The name of this step as it appears in the transformation workspace. Note: This name must be unique within a single transformation.

Group fields table

Specify the fields you want to group. Click Get Fields to add all fields from the input stream(s). The step will do no additional sorting, so in addition to the grouping identified (for example CUSTOMER_ID) here you must also have the data sorted (for example ORDER_DATE).

Analytic Functions table

Specify the analytic functions to be solved.

New Field Name

the name you want this new field to be named on the stream (for example PREV_ORDER_DATE)

Subject

The existing field to grab (for example ORDER_DATE)

Type

Set the type of analytic function:
Lead - Go forward N rows and get the value of Subject
Lag - Go backward N rows and get the value of Subject

N

The number of rows to offset (backwards or forwards)

Examples

These are the examples that are available in our distribution:

Code Block
samples/transformations/Analytic Query - Lead One Example.ktr
samples/transformations/Analytic Query - Random Value Example.ktr

Group field examples

While it is not mandatory to specify a group, it can be useful for certain cases. If you create a group (made up of one or more fields), then the "lead forward / lag backward" operations are made only within each group. For example, suppose you have this:

Code Block
X   , Y
--------
aaa , 1
aaa , 2
aaa , 3
bbb , 4
bbb , 5
bbb , 6

And you want to create a field named Z, with the Y value in the previous row.

If you only care about the Y field, you don't need to group. And you will have the following result:

Code Block
X   , Y , Z
------------
aaa , 1 , <null>
aaa , 2 , 1
aaa , 3 , 2
bbb , 4 , 3
bbb , 5 , 4
bbb , 6 , 5

But if you don't want to mix the values for aaa and bbb, you can group by the X field, and you will have this:

Code Block
X   , Y , Z
------------
aaa , 1 , <null>
aaa , 2 , 1
aaa , 3 , 2
bbb , 4 , <null>
bbb , 5 , 4
bbb , 6 , 5

Thus, by grouping (provided the input is sorted according to your grouping), you can be assured that lead or lag operations will not return row values outside of the defined group.