Access Keys:
Skip to content (Access Key - 0)

Introduction

The PostgreSQL bulk loader is an experimental step in which we will to stream data from inside Kettle to the psql command using "COPY DATA FROM STDIN" into the database.
This way of loading data offers the best of both worlds : the performance of a bulk load and the flexibility of a Pentaho Data Integration transformation.

Make sure to check out the "Set up authentication" section below!

Note: This step does not work with a JNDI defined connection, only JDBC is supported.

Options

Option Description
Step name Name of the step.

Note: This name has to be unique in a single transformation.

Connection Name of the database connection on which the target table resides.

Note: The password of this database connection is not used, see below in the "Set up authentication" section! Since PDI-1901 is fixed in 3.2.3, the username of the connection is used and added to the -U parameter, otherwise the logged in user acount would be taken.
Target schema The name of the Schema for the table to write data to. This is important for data sources that allow for table names with dots '.' in it.
Target table Name of the target table.
psql path Full path to the psql utility. If psql is in the path of the executing application you can leave it to simply psql.
Load action Insert, Truncate. Insert inserts, truncate first truncates the table.
   
   

Set up authentication


"psql" doesn't allow you to specify the password.  Here is a part of the connection options:

 Connection options:
  -h HOSTNAME     database server host or socket directory (default: "/var/run/postgresql")
  -p PORT         database server port (default: "5432")
  -U NAME         database user name (default: "matt" - if you are not Matt:
                  Since PDI 3.2.3 the username of the connection is taken, see PDI-1901.)
  -W              prompt for password (should happen automatically)

As you can see there is no way to specify a password for the database.  It will always prompt for a password on the console no matter what.

To overcome this you need to set up trusted authentication on the PostgreSQL server.

To make this happen, change the pg_hba.conf file (on my box this is /etc/postgresql/8.2/main/pg_hba.conf) and add a line like this:

host    all         all         192.168.1.0/24        trust

This basically means that everyone from the 192.168.1.0 network (mask 255.255.255.0) can log into postgres on all databases with any username.  If you are running Kettle on the same server, change it to localhost:

host    all         all         127.0.0.1/32        trust

 This is much safer of-course.  Make sure you don't invite any strangers onto your PostgreSQL database!

 TIP! Make sure to restart your database server after you made this change


  1. Aug 03, 2009

    Balázs Bárány says:

    It shouldn't be necessary to fully restart the database after pg_hba.conf change...

    It shouldn't be necessary to fully restart the database after pg_hba.conf changes. Reload usually does it, too.

    /etc/init.d/postgresql reload

  2. Dec 15, 2009

    Marc Cousin says:

    For authentication, you can also specify it in .pgpass (or pgpass.conf for windo...

    For authentication, you can also specify it in .pgpass (or pgpass.conf for windows) configuration file.
    An entry may look like this :
    my_server_name:5432:*:my_user_name:my_secret_password.

    No need to risk opening your whole database server with passwordless login as advised.

    The file has to be 600 on Unix.
    Here is the doc :

    pgpass doc

  3. Feb 24

    Lens Messaoud Lama says:

    What is the path of psql if that is on Unix system ? Thanks

    What is the path of psql if that is on Unix system ?

    Thanks

  4. Mar 21

    Patrik Lenart says:

    Hi, Could you provide me with any information related to use of PostgreSQL bulk...

    Hi,

    Could you provide me with any information related to use of PostgreSQL bulk loader? I created the simple transformation:

    Text File Input -> PostgreSQL Bulk Loader. When I run transformation I got error message:

    2014/03/21 13:14:18 - Transformation 3 - Dispatching started for transformation [Transformation 3] 
    2014/03/21 13:14:18 - Text file input.0 - Opening file: file:///tmp/file.txt 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - Executing command: /usr/bin/psql -U XXX -h XXX -p 5432 XXX

    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - Launching command: COPY lead_tool.profcopy ( profile_identifier, title, fan_count, is_in_analytics, url_key, profile_countries, profile_tags, profile_type ) FROM STDIN WITH CSV DELIMITER AS ';' QUOTE AS ''; 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - ERROR {0} /usr/lib/postgresql/9.1/bin/psql: relocation error: /usr/lib/postgresql/9.1/bin/psql: symbol SSL_CIPHER_get_name, version OPENSSL_1.0.0 not defined in file libssl.so.1.0.0 with link time reference 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - ERROR (version 5.0.1, build 1 from 2013-10-30_19-53-32 by buildguy) : Error in step 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - ERROR (version 5.0.1, build 1 from 2013-10-30_19-53-32 by buildguy) : org.pentaho.di.core.exception.KettleException: 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - Error serializing rows of data to the psql command 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - Broken pipe 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at org.pentaho.di.trans.steps.pgbulkloader.PGBulkLoader.writeRowToPostgres(PGBulkLoader.java:450) 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at org.pentaho.di.trans.steps.pgbulkloader.PGBulkLoader.processRow(PGBulkLoader.java:316) 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:60) 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at java.lang.Thread.run(Unknown Source) 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - Caused by: java.io.IOException: Broken pipe 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at java.io.FileOutputStream.writeBytes(Native Method) 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at java.io.FileOutputStream.write(Unknown Source) 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at java.io.BufferedOutputStream.flushBuffer(Unknown Source) 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at java.io.BufferedOutputStream.write(Unknown Source) 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at java.io.FilterOutputStream.write(Unknown Source) 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at org.pentaho.di.trans.steps.pgbulkloader.PGBulkLoader.writeRowToPostgres(PGBulkLoader.java:361) 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - ... 3 more 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - Finished processing (I=0, O=710, R=711, W=710, U=0, E=1) 
    2014/03/21 13:14:18 - Text file input.0 - Finished processing (I=1396, O=0, R=0, W=1396, U=0, E=0) 
    2014/03/21 13:14:18 - Transformation 3 - Transformation 3 
    2014/03/21 13:14:18 - Transformation 3 - Transformation 3

  5. Mar 24

    Patrik Lenart says:

    It looks like option with pgpass doesnt work. It works from terminal but doesnt ...

    It looks like option with pgpass doesnt work. It works from terminal but doesnt work from PDI.

  6. Mar 25

    Patrik Lenart says:

    solution for the Openssl problem with psql client is to use psql client located ...

    solution for the Openssl problem with psql client is to use psql client located inside pentaho install directory.

This documentation is maintained by the Pentaho community, and members are encouraged to create new pages in the appropriate spaces, or edit existing pages that need to be corrected or updated.

Please do not leave comments on Wiki pages asking for help. They will be deleted. Use the forums instead.

Adaptavist Theme Builder (4.2.0) Powered by Atlassian Confluence 3.3.3, the Enterprise Wiki