Description
This step allows you to calculate values over a defined group of fields. Examples of common use cases are:
- calculate the average sales per product
- get the number of yellow shirts that we have in stock
Options
The following table provides a description of the options available for the Group By step:
| Option | Description |
|---|---|
| Step name | Name of the step; this name has to be unique in a single transformation |
| Include all rows? | Enable if you want all rows in the output, not just the aggregation; to differentiate between the two types of rows in the output, a flag is required in the output. You must specify the name of the flag field in that case (the type is boolean). |
| Temporary files directory | The directory in which the temporary files are stored if needed; the default is the standard temporary directory for the system |
| TMP-file prefix | Specify the file prefix used when naming temporary files |
| Add line number, restart in each group | Enable to add a line number that restarts at 1 in each group |
| Line number field name | Enable to add a line number that restarts at 1 in each group |
| Always give back a row |
If you enable this option, the Group By step will always give back a result row, even if there is no input row. This can be useful if you want to count the number of rows. Without this option you would never get a count of zero (0). |
| Group fields table | Specify the fields over which you want to group. Click Get Fields to add all fields from the input stream(s). |
| Aggregates table | Specify the fields that must be aggregated, the method and the name of the resulting new field. Here are the available aggregation methods :
|
Examples
These are the examples that are available in our distribution:
samples/transformations/Group By - Calculate standard deviation.ktr samples/transformations/Group by - include all rows and calculations .ktr samples/transformations/Group By - include all rows without a grouping.ktr