Hitachi Vantara Pentaho Community Wiki
Child pages
  • Ingres VectorWise Bulk Loader

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

Ingres VectorWise Bulk Loader

The company is now named Actian. The product is now named Vector.  So this step should now be named the Actian Vector Bulk Loader. The interface has not changed. So this step is still working with all versions. There is also a clustered version Vector in Hadoop with the same interface (product name Vortex). This loader was tested with Actian Vector H Linux Version VH 4.0.1 and Kettle - Spoon GA Release - 5.3.0.0-213 on 30.March, 2015. What is said below about earlier versions still applies.

This page describes the gotcha's with respect to the VectorWise bulk loader step in PDI 4.2.0.

Load Alternatives

There are two different ways to load data, with a COPY TABLE SQL command, or with the tool vwload. Per default the sql command is used. To use the vwload tool, which is faster than the COPY TABLE alternative, remove any given 'Path to sql command', and check the box 'Use vwload as command, instead of sql'. 

Operating systems

The VW bulk loader only runs under POSIX operating systems that support named pipes.  This includes every modern operating system besides Windows.

Authorization

Bulk loading needs to happen while your transformation runs as the "ingres" user, or at least the user with which you connect and with which you have permissions to bulk load the data.  The bulk load command needs to have permissions to read from the FIFO file

Data format

The text format that is supported by the bulk loader is very specific.  As such, you need to make sure to set the data to the correct format before you pass it to the IVW bulk loader step. 

  • Don't pass integers or numerical data with leading zeroes.  Change the conversion mask from 0000000;-00000000 to something like 0;-0
  • Make sure to pass only dates or date-time data formatted in the following ways:
    • TIMESTAMP fields:  yyyy-MM-dd HH:mm:ss
    • DATE fields:  yyyy-MM-dd

If you are reading data with a "CSV Input" step and you're using "lazy conversion" make sure to convert the columns with a problem to normal storage type before you change the representation format.

Parallel bulk loading

...into a single table is sadly not support by Ingres VectorWise at this time (v1.5.0 was tested).  However, it is possible to bulk load in parallel into multiple tables.

So if you're doing parallel loading by specifying multiple step copies, make sure to load into a customers table, you could load into 2 temporary tables customer1_0, customer1_1.

The table name then becomes:

Code Block
customers_${Internal.Step.Unique.Number}

The FIFO named pipe filename becomes: 

Code Block
customers_${Internal.Step.Unique.Number}

After loading you can copy the data into the customers table like this:

Code Block
insert into customers select * from customers_0;
delete from customers_0;

insert into customers select * from customers_1;
delete from customers_1;
...

On systems with a lot of CPU cores you should see a performance benefit.

Error handling options

  • Checkbox "Use Continue On Error Option" affects only sql mode (number of max errors is ignored)
  • Field "MaxErrorsNumber" affects only vwload (checkbox ignored),
    specifying "0" allows the user to check all input rows and write errors to the given log file (error file field). (PDI-6512 fixed a situation with two errors for one error row.)