The company is now named Actian. The product is now named Vector. So this step should now be named the Actian Vector Bulk Loader. The interface has not changed. So this step is still working with all versions. There is also a clustered version Vector in Hadoop with the same interface (product name Vortex). This loader was tested with Actian Vector H Linux Version VH 4.0.1 and Kettle - Spoon GA Release - 220.127.116.11-213 on 30.March, 2015. What is said below about earlier versions still applies.
This page describes the gotcha's with respect to the VectorWise bulk loader step in PDI 4.2.0.
There are two different ways to load data, with a COPY TABLE SQL command, or with the tool vwload. Per default the sql command is used. To use the vwload tool, which is faster than the COPY TABLE alternative, remove any given 'Path to sql command', and check the box 'Use vwload as command, instead of sql'.
The VW bulk loader only runs under POSIX operating systems that support named pipes. This includes every modern operating system besides Windows.
Bulk loading needs to happen while your transformation runs as the "ingres" user, or at least the user with which you connect and with which you have permissions to bulk load the data. The bulk load command needs to have permissions to read from the FIFO file
The text format that is supported by the bulk loader is very specific. As such, you need to make sure to set the data to the correct format before you pass it to the IVW bulk loader step.
If you are reading data with a "CSV Input" step and you're using "lazy conversion" make sure to convert the columns with a problem to normal storage type before you change the representation format.
...into a single table is sadly not support by Ingres VectorWise at this time (v1.5.0 was tested). However, it is possible to bulk load in parallel into multiple tables.
So if you're doing parallel loading by specifying multiple step copies, make sure to load into a customers table, you could load into 2 temporary tables customer1_0, customer1_1.
The table name then becomes:
The FIFO named pipe filename becomes:
After loading you can copy the data into the customers table like this:
insert into customers select * from customers_0; delete from customers_0; insert into customers select * from customers_1; delete from customers_1; ...
On systems with a lot of CPU cores you should see a performance benefit.