Added by Matt Casters, last edited by Daniel Einspanjer on Nov 13, 2008  (view change) show comment

Labels:

Enter labels to add to this page:
Wait Image 
Looking for a label? Just start typing.

Description

The Row Normalizer step normalizes data back from pivoted tables. For example, below is a sample table of product sales data:

Month Product A Product B Product C
2003/01 10 5 17
2003/02 12 7 19
... ... ... ...

The Row Normalizer step converts the data into the format below
so that it is easier to update your fact table:

Month Product sales
2003/01 A 10
2003/01 B 5
2003/01 C 17
2003/02 A 12
2003/02 B 7
2003/02 C 19
... ... ...

Options

The following options are available for the Row Normalizer Step:

Option Description
Step name Name of the step; this name has to be unique in a single transformation.
Typefield The name of the type field (product in the example above)
Fields table A list of the fields you want to normalize; you must set the following properties for each selected field:
  • Fieldname: Name of the fields to normalize (Product A ? C in the example).
  • Type: Give a string to classify the field (A, B or C in our example).
  • New field: You can give one or more fields where the new value should transferred to (sales in our example).
Get Fields Click to retrieve a list of all fields coming in on the stream(s).

Normalizing multiple rows in a single step

The example below illustrates using the Row Normalizer step to normalize more than one row at a time starting with the following data format:

DATE PR1_NR PR_SL PR2_NR PR2_SL PR3_NR PR3_SL
20030101 5 100 10 250 4 150
... ... ... ... ... ... ...

You can convert the data to a table similar to the one shown below:

DATE Type Product Sales Product Number
20030101 Product1 100 5
20030101 Product2 250 10
20030101 Product3 150 4
... ... ... ...

Below is the setup you use to create the table:

Some things to keep in mind when using this transformation

  • The input stream may have more than the variables used in transformation - in the example shown it has DATE column that is not part of transformation.  Any columns in input stream not part of normalizer will be passed through to the next step as a part of the row.  See above how date wasn't part of row normalizer transformation, yet it is first field in row.  Imagine all other columns not part of transformation, but part of input showing up as fields in the row.
  • Do not try to force extra columns through as separate 'types' in the row normalizer transformation - it's not needed and will cause the step to crash
Comment: Posted by Joe Kambeitz at Apr 06, 2010 18:44