Added by Matt Casters, last edited by Jens Bleuel on Jan 21, 2010  (view change)

Labels:

Enter labels to add to this page:
Wait Image 
Looking for a label? Just start typing.

Introduction

The PostgreSQL bulk loader is an experimental step in which we will to stream data from inside Kettle to the psql command using "COPY DATA FROM STDIN" into the database.
This way of loading data offers the best of both worlds : the performance of a bulk load and the flexibility of a Pentaho Data Integration transformation.

Make sure to check out the "Set up authentication" section below!

Note: This step does not work with a JNDI defined connection, only JDBC is supported.

Options

Option Description
Step name Name of the step.

Note: This name has to be unique in a single transformation.

Connection Name of the database connection on which the target table resides.

Note: The password of this database connection is not used, see below in the "Set up authentication" section! Since PDI-1901 is fixed in 3.2.3, the username of the connection is used and added to the -U parameter, otherwise the logged in user acount would be taken.
Target schema The name of the Schema for the table to write data to. This is important for data sources that allow for table names with dots '.' in it.
Target table Name of the target table.
psql path Full path to the psql utility. If psql is in the path of the executing application you can leave it to simply psql.
Load action Insert, Truncate. Insert inserts, truncate first truncates the table.
   
   

Set up authentication


"psql" doesn't allow you to specify the password.  Here is a part of the connection options:

 Connection options:
  -h HOSTNAME     database server host or socket directory (default: "/var/run/postgresql")
  -p PORT         database server port (default: "5432")
  -U NAME         database user name (default: "matt" - if you are not Matt:
                  Since PDI 3.2.3 the username of the connection is taken, see PDI-1901.)
  -W              prompt for password (should happen automatically)

As you can see there is no way to specify a password for the database.  It will always prompt for a password on the console no matter what.

To overcome this you need to set up trusted authentication on the PostgreSQL server.

To make this happen, change the pg_hba.conf file (on my box this is /etc/postgresql/8.2/main/pg_hba.conf) and add a line like this:

host    all         all         192.168.1.0/24        trust

This basically means that everyone from the 192.168.1.0 network (mask 255.255.255.0) can log into postgres on all databases with any username.  If you are running Kettle on the same server, change it to localhost:

host    all         all         127.0.0.1/32        trust

 This is much safer of-course.  Make sure you don't invite any strangers onto your PostgreSQL database!

 TIP! Make sure to restart your database server after you made this change

It shouldn't be necessary to fully restart the database after pg_hba.conf changes. Reload usually does it, too.

/etc/init.d/postgresql reload

Comment: Posted by Balázs Bárány at Aug 03, 2009 03:29

For authentication, you can also specify it in .pgpass (or pgpass.conf for windows) configuration file.
An entry may look like this :
my_server_name:5432:*:my_user_name:my_secret_password.

No need to risk opening your whole database server with passwordless login as advised.

The file has to be 600 on Unix.
Here is the doc :

pgpass doc

Comment: Posted by Marc Cousin at Dec 15, 2009 06:41

What is the path of psql if that is on Unix system ?

Thanks

Comment: Posted by Lens Messaoud Lama at Feb 24, 2014 10:52

Hi,

Could you provide me with any information related to use of PostgreSQL bulk loader? I created the simple transformation:

Text File Input -> PostgreSQL Bulk Loader. When I run transformation I got error message:

2014/03/21 13:14:18 - Transformation 3 - Dispatching started for transformation [Transformation 3] 
2014/03/21 13:14:18 - Text file input.0 - Opening file: file:///tmp/file.txt 
2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - Executing command: /usr/bin/psql -U XXX -h XXX -p 5432 XXX

2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - Launching command: COPY lead_tool.profcopy ( profile_identifier, title, fan_count, is_in_analytics, url_key, profile_countries, profile_tags, profile_type ) FROM STDIN WITH CSV DELIMITER AS ';' QUOTE AS ''; 
2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - ERROR {0} /usr/lib/postgresql/9.1/bin/psql: relocation error: /usr/lib/postgresql/9.1/bin/psql: symbol SSL_CIPHER_get_name, version OPENSSL_1.0.0 not defined in file libssl.so.1.0.0 with link time reference 
2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - ERROR (version 5.0.1, build 1 from 2013-10-30_19-53-32 by buildguy) : Error in step 
2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - ERROR (version 5.0.1, build 1 from 2013-10-30_19-53-32 by buildguy) : org.pentaho.di.core.exception.KettleException: 
2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - Error serializing rows of data to the psql command 
2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - Broken pipe 
2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - 
2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at org.pentaho.di.trans.steps.pgbulkloader.PGBulkLoader.writeRowToPostgres(PGBulkLoader.java:450) 
2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at org.pentaho.di.trans.steps.pgbulkloader.PGBulkLoader.processRow(PGBulkLoader.java:316) 
2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:60) 
2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at java.lang.Thread.run(Unknown Source) 
2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - Caused by: java.io.IOException: Broken pipe 
2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at java.io.FileOutputStream.writeBytes(Native Method) 
2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at java.io.FileOutputStream.write(Unknown Source) 
2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at java.io.BufferedOutputStream.flushBuffer(Unknown Source) 
2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at java.io.BufferedOutputStream.write(Unknown Source) 
2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at java.io.FilterOutputStream.write(Unknown Source) 
2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at org.pentaho.di.trans.steps.pgbulkloader.PGBulkLoader.writeRowToPostgres(PGBulkLoader.java:361) 
2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - ... 3 more 
2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - Finished processing (I=0, O=710, R=711, W=710, U=0, E=1) 
2014/03/21 13:14:18 - Text file input.0 - Finished processing (I=1396, O=0, R=0, W=1396, U=0, E=0) 
2014/03/21 13:14:18 - Transformation 3 - Transformation 3 
2014/03/21 13:14:18 - Transformation 3 - Transformation 3

Comment: Posted by Patrik Lenart at Mar 21, 2014 08:44

It looks like option with pgpass doesnt work. It works from terminal but doesnt work from PDI.

Comment: Posted by Patrik Lenart at Mar 24, 2014 08:52

solution for the Openssl problem with psql client is to use psql client located inside pentaho install directory.

Comment: Posted by Patrik Lenart at Mar 25, 2014 06:03