Added by Matt Casters, last edited by Jens Bleuel on Nov 20, 2012  (view change)

Labels:

Enter labels to add to this page:
Wait Image 
Looking for a label? Just start typing.

Description

This step reads files in sas7bdat format created by SAS software (SAS Institute, Inc.).

There are 2 known data types read from these files: Number and String.

Since: PDI version 4.4.0-M1, October 12th, 2012.

This step is extensively using the Sassy Reader project library by eobjects, written by Kasper Sørensen.

4.4.0 release note: Unfortunately we found an issue (PDI-8875) with this step that was too late to incorporate into 4.4.0. This step throws a "java.lang.NoClassDefFoundError" caused by missing ../libext/eobject path in launcher/launcher.properties. Workaround is to change the libraries line in launcher.properties to:
libraries=../test:../test/libext:../lib:../libext:../libext/commons:../libext/elasticsearch:../libext/feeds:../libext/google:../libext/hl7:../libext/JDBC:../libext/jersey:../libext/jfree:../libext/mondrian:../libext/pentaho:../libext/poi:../libext/reporting:../libext/rules:../libext/salesforce:../libext/spring:../libext/web:../libext/webservices:../libswt:../libext/eobjects

Options


There are 2 main options: 

  • Field in the input to use as filename: Select the input field that will contain the filename at runtime.  For example you can use a "Get file names" step to drive the content of this field.  IMPORTANT: only local files are supported at this time. You should refrain from using VFS like file specifications.
  • The selected fields from the files: If you use the "Get Fields" button you can populate this data grid.  Please note that even though the sas7bdat file format only contains Number and String formats, that you can specify any desired data type and that PDI will convert for you.  Also note that not all fields need to be specified and that you can re-order fields and give them new names. The selection of the fields will be based on the name column.

Limitations

  • All the files read by a single copy of a step need to have the same format: they need to contain the same number of columns and the columns need to have identical names and data types.
  • For numeric data that contains the value NaN (Not a Number) we convert the value to null (empty)