Hitachi Vantara Pentaho Community Wiki
Skip to end of metadata
Go to start of metadata

This step is deprecated. Please use the Get Data From XML or XML Input Stream (StAX) steps.

Description

This step allows you to read information stored in XML files. The following sections describe the interface for defining the filenames you want to read from, the repeating part of the data part of the XML file and the fields to retrieve.

Note: You specify the fields by the path to the Element or Attribute and by entering conversion masks, data types and other meta-data.

File Tab

The File tab is where you define the location of the Excel files from which you want to read. The table below contains available options:

Option

Description

Step name

Name of the step; the name has to be unique in a single transformation

File or directory

Specifies the location and/or name of the input text file

Note: Click Add to add the file/directory/wildcard combination to the list of selected files (grid) below.

Regular expression

Specifies the regular expression you want to use to select the files in the directory specified in the previous option

Selected files

This table contains a list of selected files (or wildcard selections) and a property specifying if file is required or not. If a file is required and it is not found, an error is generated, otherwise, the filename is skipped.

Show Filename(s)

This option shows a list of the files the will be generated. Note: This is a simulation and sometimes depends on the number of rows in each file, for example.

Content

The content tab contains the following options for describing the content being read:

Option

Description

Include filename in output & fieldname

Enable if you want to have the name of the XML file to which the row belongs in the output stream. You can specify the name of the field where the file name will end up.

Rownum in output & fieldname

Enable if you want to have a row number (starts at 1) in the output stream. You can specify the name where the integer will end up in.

Limit

Specifies the maximum number of rows to read here (optional)

Nr of header rows to skip

Specifies the number of rows to skip, from the start of an XML document, before starting to process.

Location

Specifies the path by way of elements to the repeating part of the XML file. For example, if you are reading rows from this XML [file:
]

<Rows>
    <Row>
        <Field1>...</Field1> ...
    </Row>
    ...
</Rows>

Then you set the location to Rows, Row

Note: You can also set the root (Rows) as a repeating element location. The output will then contain 1 (one) row.

Fields

The Fields tab allows you to define properties for the location and format of the fields being read from the XML document. The table below describes each of the options for configuring the field properties:

Option

Description

Name

The name of the field

Type

Type of the field can be either String, Date or Number.

Format

The format mask to convert with. See Number Formats for a complete description of format specifiers.

Length

The length option depends on the field type as follows"

  • Number - Total number of significant figures in a number
  • String - total length of string
  • Date - length of printed output of the string (e.g. 4 only gives back year)

Precision

The precision option depends on the field type as follows:

  • Number - Number of floating point digits
  • String - unused
  • Date - unused

Currency

Symbol used to represent currencies like $10,000.00 or E5.000,00

Decimal

A decimal point can be a "." (10,000.00) or "," (5.000,00)

Group

A grouping can be a "," (10,000.00) or "." (5.000,00)

Trim

The trimming method to apply on the string found in

type

the XML

Repeat

Enable if you want to repeat empty values with the corresponding value from the previous row.

Position

The position of the XML element or attribute. You use the following syntax to specify the position of an element, for example:
The first element called "element": E=element/1
The first attribute called "attribute": A=attribute/1
The first attribute called "attribute" in the second "element" tag: E=element/2, A=attribute/1

Note: Click Get Fields to auto-generate all the possible positions in the XML file.

Note: Pentaho has added support for XML documents where all the information is stored in the Repeating (or Root) element. The special R= locator was added to allow you to grab this information. Click Get fields to find information if it is available.

FAQ

Can you change the XML input step to process my file?

Q: I have to process an XML file which currently can't be processed by KETTLE, e.g. there's one optional field which depends on the value of an element and that should also be included as a field in a row, ... Can you build this functionality in in the XML input step?

A: First of all it would depend what functionality you need. If the functionality is generally useful it can be built in. If it would only be useful for you it wouldn't make sense to build it in.

As alternative solutions: consider processing the XML file via a Javascript step, or if what is required is very complex consider writing your own PDI step which you maintain yourself (outside of the PDI distribution).

  • No labels