Added by Matt Casters, last edited by Jens Bleuel on Jun 24, 2011  (view change)

Labels:

Enter labels to add to this page:
Wait Image 
Looking for a label? Just start typing.

This step is deprecated. Please use the Get Data From XML or XML Input Stream (StAX) steps.

Description

The purpose of this step is to provide value parsing. This step is based on SAX parser to provide better performances with larger files. It is very similar to Xml Input, there are only differences in content and field tabs. The following sections describe in detail the properties and settings available for the Streaming XML input step.

File Tab

Option Description
Step name Name of the step. This name has to be unique in a single transformation.
File or directory This field specifies the location and/or name of the input text file.

Note: press the "add" button to add the file/directory/wildcard combination to the list of selected files (grid) below.

Regular expression Specify the regular expression you want to use to select the files in the directory specified in the previous option.
Selected Files This table contains a list of selected files (or wildcard selections) along with a property specifying if file is required or not. If a file is required and it isn't found, an error is generated. Otherwise, the filename is simply skipped.
Show filenames(s)... Displays a list of all files that will be loaded based on the current selected file definitions.

Content

Option Description
Include filename in output & fieldname Check this option if you want to have the name of the XML file to which the row belongs in the output stream. You can specify the name of the field where the filename will end up in.
Rownum in output & fieldname
Limit
Location
Check this option if you want to have a row number (starts at 1) in the output stream. You can specify the name where the integer will end up in. You can specify the maximum number of rows to read here. Specify the path by way of elements to the repeating part of the XML file. The element column is used to specify the element and position as follows:
  • A: still specify an attribute
  • Ep: specify an element defined by position (equivalent to E in original XMLInput).
  • Ea: specify an element defined by an attribute and allow value parsing.
    Example:
    Ep=element/1       this is the first element
    called "element"
    Ea=element/att:val    this is the element
    called "element" that have an attribute called
    "att" with "val" value
    

Fields

Option Description
Name Name of the field
Type Type of the field can be either String, Date or Number
Format See Number Formats for a complete description of format symbols.
Length For Number: Total number of significant figures in a number;
For String: total length of string;
For Date: length of printed output of the string (e.g. 4 only gives back the year).
Precision For Number: Number of floating point digits;
For String, Date, Boolean: unused;
Currency Used to interpret numbers like $10,000.00 or E5.000,00
Decimal A decimal point can be a "." (10;000.00) or "," (5.000,00)
Group A grouping can be a dot "," (10;000.00) or "." (5.000,00)
Trim type type trim this field (left, right, both) before processing
Null if treat this value as NULL
Repeat Y/N: If the corresponding value in this row is empty: repeat the one from the last time it was not empty
Position Position: The position of the XML element or attribute. You use the following syntax to specify the position of an element:
The first element called "element": E=element/1
The first attribute called "attribute": A=attribute/1
The first attribute called "attribute" in the second "element" tag: E=element/2, A=attribute/1

Note: You can auto-generate all the possible positions in the XML file supplied by using the "Get Fields" button.
Note: Support was added for XML documents where all the information is stored in the Repeating (or Root) element. The special R= locater was added to allow you to grab this information. The "Get fields" button finds this information if it's present.

Streaming XML Example

Consider the following XML:
 
Suppose that we are interested in cars we must specify the location of the repeating element like this:

Now lets see the fields, we have different "property" elements that are differentiated by their "name" attribute, we are about to have the following fields "brand", "type" and "power" according to the "name" attribute.

For this, we must specify the association between "property" and "name" in the first grid.

Click Get Fields to retrieve the right fields including properties.

Let us now try leaving the new grid empty.

You can see that in this case the step is working like the original XMLInput and retrieve fields by their position. In this case, it is better to use value parsing, cause you get the right field names, and missing elements will not corrupt results (for example missing <property name="power"> </property> in some rows).