Added by Matt Casters, last edited by Marina Baker on Mar 02, 2010  (view change)

Labels:

Enter labels to add to this page:
Wait Image 
Looking for a label? Just start typing.

Description

This step allows you to calculate values over a defined group of fields. Examples of common use cases are:

  • calculate the average sales per product
  • get the number of yellow shirts that we have in stock

Options

The following table provides a description of the options available for the Group By step:

Option Description
Step name Name of the step; this name has to be unique in a single transformation
Include all rows? Enable if you want all rows in the output, not just the aggregation; to differentiate between the two types of rows in the output, a flag is required in the output. You must specify the name of the flag field in that case (the type is boolean).
Temporary files directory The directory in which the temporary files are stored if needed; the default is the standard temporary directory for the system
TMP-file prefix Specify the file prefix used when naming temporary files
Add line number, restart in each group Enable to add a line number that restarts at 1 in each group
Line number field name Enable to add a line number that restarts at 1 in each group
Always give back a row
If you enable this option, the Group By step will always give back a result row, even if there is no input row. 
This can be useful if you want to count the number of rows.  Without this option you would never get a count of zero (0).
Group fields table Specify the fields over which you want to group. Click Get Fields to add all fields from the input stream(s).
Aggregates table Specify the fields that must be aggregated, the method and the name of the resulting new field.
Here are the available aggregation methods :
  • Sum
  • Average (Mean)
  • Minimum
  • Maximum
  • Number of values (N)
  • Concatenate strings separated by , (comma)
  • First non-null value
  • Last non-null value
  • First value (including null)
  • Last value (including null)
  • Cumulative sum (all rows option only!)
  • Cumulative average (all rows option only!)
  • Standard deviation
  • Concatenate strings separated by <Value>: specify the separator in the Value column

Examples

These are the examples that are available in our distribution:

samples/transformations/Group By - Calculate standard deviation.ktr
samples/transformations/Group by - include all rows and calculations .ktr
samples/transformations/Group By - include all rows without a grouping.ktr