The Sort rows step sorts rows based on the fields you specify and on whether they should be sorted in ascending or descending order.
- Kettle has to sort rows using temporary files when the number of rows exceeds the specified sort size (default 1 million rows). When you get an out of memory exception (OOME), you need to lower this size of change your available memory.
- When you use multiple copies of the step in parallel (on the local JVM with "Change number of copies to start" or in a clustered environment using Carte) each of the sorted blocks need to be merged together to ensure the proper sort sequence. This can be done, be adding the Sorted Merge step afterwards (on the local JVM without multiple copies to start or in the cluster on the master).
The following table describes the options associated with the Sort step:
Name of the step;this name has to be unique in a single transformation.
The directory in which the temporary files are stored in case when needed; the default is the standard temporary directory for the system
Choose an easily recognized prefix so you can identify the files when they show up in the temp directory.
The more rows you store in memory, the faster the sorting process because fewer temporary files must be used and less I/O is generated.
Free memory threshold (in %)
If the sort algorithm finds that it has less available free memory than the indicated number, it will start to page data to disk.
Compress TMP Files
Compresses temporary files when they are needed to complete the sort.
Only pass unique rows?
Enable if you want to pass unique rows only to the output stream(s).
Specify the fields and direction (ascending/descending) to sort. You can specify whether to perform a case sensitive sort (optional)
Click to retrieve a list of all fields coming in on the stream(s).
Metadata Injection Support
All fields of this step support metadata injection. You can use this step with ETL Metadata Injection to pass metadata to your transformation at runtime.