Added by Matt Casters, last edited by Jens Bleuel on Jun 24, 2009  (view change)

Labels:

Enter labels to add to this page:
Wait Image 
Looking for a label? Just start typing.

General Description

The Text file output step is used to export data to text file format. This is commonly used to generate Comma Separated Values (CSV files) that can be read by spreadsheet applications.  It is also possible to generate fixed width files by setting lengths on the fields in the fields tab.

File Tab

The File tab is where you define basic properties about the file being created, such as:

Option Description
Step name Name of the step. Note: This name has to be unique in a single transformation.
Filename This field specifies the filename and location of the output text file.
Run this as a command instead? Check this to "pipe" the results into the command or script you specify.
Extension Adds a point and the extension to the end of the filename. (.txt)
Accept file name from field?
Checking this option allows you to specify the filename(s) in a field in the input stream.
File name field
When the previous option is enabled, you can specify the field that will contain the filename(s) at runtime.
Include stepnr in filename If you run the step in multiple copies (Launching several copies of a step), the copy number is included in the filename, before the extension. (_0).
Include partition nr in filename? Includes the data partition number in the filename.
Include date in filename Includes the system date in the filename. (_20041231).
Include time in filename Includes the system time in the filename. (_235959).
Show filename(s) This option shows a list of the files that will be generated.
Note: This is a simulation and among others depends on the number of rows that will go into each file.

Content tab

The content tab contains the following options for describing the content being read:

Option Description
Append Check this to append lines to the end of the specified file.
Separator Specify the character that separates the fields in a single line of text. Typically this is ; or a tab.
Enclosure A pair of strings can enclose some fields. This allows separator characters in fields. The enclosure string is optional. Header Enable this option if you want the text file to have a header row. (First line in the file).
Force the enclosure around fields? This option forces all field names to be enclosed with the character specified in the Enclosure property above.
Header Enable this option if you want the text file to have a header row. (First line in the file).
Footer Enable this option if you want the text file to have a footer row. (Last line in the file).
Format This can be either DOS or UNIX. UNIX files have lines are separated by linefeeds. DOS files have lines separated by carriage returns and line feeds.
Encoding Specify the text file encoding to use. Leave blank to use the default encoding on your system. To use Unicode specify UTF-8 or UTF-16. On first use, Spoon will search your system for available encodings.
Compression Allows you to specify the type of compression, .zip or .gzip to use when compressing the output. Note: At the moment, only one file is placed in a single archive.
Right pad fields Add spaces to the end of the fields (or remove characters at the end) until they have the specified length.
Fast data dump (no formatting) Improves the performance when dumping large amounts of data to a text file by not including any formatting information.
Split every ... rows If this number N is larger than zero, split the resulting text-file into multiple parts of N rows.
Add Ending line of file Allows you to specify an alternate ending row to the output file.

Fields tab

The fields tab is where you define properties for the fields being exported. The table below describes each of the options for configuring the field properties:

Option Description
Name The name of the field.
Type Type of the field can be either String, Date or Number.
Format The format mask to convert with. See Number Formats for a complete description of format symbols.
Length The length option depends on the field type follows:
  • Number - Total number of significant figures in a number
  • String - total length of string
  • Date - length of printed output of the string (e.g. 4 only gives back year)
Precision The precision option depends on the field type as follows:
  • Number - Number of floating point digits
  • String - unused
  • Date - unused
Currency Symbol used to represent currencies like $10,000.00 or E5.000,00
Decimal A decimal point can be a "." (10,000.00) or "," (5.000,00)
Group A grouping can be a "," (10,000.00) or "." (5.000,00)
Trim type The trimming method to apply on the string. Trimming only works when there is no field length given. (see feature request PDI-2486)
Null If the value of the field is null, insert this string into the textfile
Get Click to retrieve the list of fields from the input fields stream(s)
Minimal width Alter the options in the fields tab in such a way that the resulting width of lines in the text file is minimal. So instead of save 0000001, we write 1, etc. String fields will no longer be padded to their specified length.

Is it possible to split rows based on field. I understand Kettle provide the ability split rows by N, however mostly I need the ability to split rows by a fields. Reason is most of the time I am working with rows that have a relationship between two rows and I can't split these rows into multiple files.

Since there is so much data I can use the switch transformation as it requires to create target for each group which can be around 200 to 2000.

 This feature will be great I am willing to develop it If some direct me in the right direction.

R

Comment: Posted by Haider Naqvi at Jul 26, 2008 13:19