Access Keys:
Skip to content (Access Key - 0)

Introduction

The PostgreSQL bulk loader is an experimental step in which we will to stream data from inside Kettle to the psql command using "COPY DATA FROM STDIN" into the database.
This way of loading data offers the best of both worlds : the performance of a bulk load and the flexibility of a Pentaho Data Integration transformation.

Make sure to check out the "Set up authentication" section below!

Note: This step does not work with a JNDI defined connection, only JDBC is supported.

Note: This step does not support timestamps at the moment (5.3). Timestamps should be converted to Date before this step. Using timestamps results in null-values in the table.

Options

Option Description
Step name Name of the step.

Note: This name has to be unique in a single transformation.

Connection Name of the database connection on which the target table resides.

Note: The password of this database connection is not used, see below in the "Set up authentication" section! Since PDI-1901 is fixed in 3.2.3, the username of the connection is used and added to the -U parameter, otherwise the logged in user acount would be taken.
Target schema The name of the Schema for the table to write data to. This is important for data sources that allow for table names with dots '.' in it.
Target table Name of the target table.
psql path Full path to the psql utility.
Load action Insert, Truncate. Insert inserts, truncate first truncates the table.
   
Fields to load
This table contains a list of fields to load data from, properties include:
  • Table field: Table field to be loaded in the PostgreSQL table;
  • Stream field: Field to be taken from the incoming rows;
  • Date mask: Either "Pass through, "Date" or "DateTime", determines how date/timestamps will be loaded in PostgreSQL.

Set up authentication


"psql" doesn't allow you to specify the password.  Here is a part of the connection options:

 Connection options:
  -h HOSTNAME     database server host or socket directory (default: "/var/run/postgresql")
  -p PORT         database server port (default: "5432")
  -U NAME         database user name (default: "matt" - if you are not Matt:
                  Since PDI 3.2.3 the username of the connection is taken, see PDI-1901.)
  -W              prompt for password (should happen automatically)

As you can see there is no way to specify a password for the database.  It will always prompt for a password on the console no matter what.

To overcome this you need to set up trusted authentication on the PostgreSQL server.

To make this happen, change the pg_hba.conf file (on my box this is /etc/postgresql/8.2/main/pg_hba.conf) and add a line like this:

host    all         all         192.168.1.0/24        trust

This basically means that everyone from the 192.168.1.0 network (mask 255.255.255.0) can log into postgres on all databases with any username.  If you are running Kettle on the same server, change it to localhost:

host    all         all         127.0.0.1/32        trust

 This is much safer of-course.  Make sure you don't invite any strangers onto your PostgreSQL database!

 TIP! Make sure to restart your database server after you made this change


  1. Aug 03, 2009

    Balázs Bárány says:

    It shouldn't be necessary to fully restart the database after pg_hba.conf change...

    It shouldn't be necessary to fully restart the database after pg_hba.conf changes. Reload usually does it, too.

    /etc/init.d/postgresql reload

  2. Dec 15, 2009

    Marc Cousin says:

    For authentication, you can also specify it in .pgpass (or pgpass.conf for windo...

    For authentication, you can also specify it in .pgpass (or pgpass.conf for windows) configuration file.
    An entry may look like this :
    my_server_name:5432:*:my_user_name:my_secret_password.

    No need to risk opening your whole database server with passwordless login as advised.

    The file has to be 600 on Unix.
    Here is the doc :

    pgpass doc

  3. Feb 24, 2014

    Lens Messaoud Lama says:

    What is the path of psql if that is on Unix system ? Thanks

    What is the path of psql if that is on Unix system ?

    Thanks

  4. Mar 21, 2014

    Patrik Lenart says:

    Hi, Could you provide me with any information related to use of PostgreSQL bulk...

    Hi,

    Could you provide me with any information related to use of PostgreSQL bulk loader? I created the simple transformation:

    Text File Input -> PostgreSQL Bulk Loader. When I run transformation I got error message:

    2014/03/21 13:14:18 - Transformation 3 - Dispatching started for transformation [Transformation 3] 
    2014/03/21 13:14:18 - Text file input.0 - Opening file: file:///tmp/file.txt 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - Executing command: /usr/bin/psql -U XXX -h XXX -p 5432 XXX

    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - Launching command: COPY lead_tool.profcopy ( profile_identifier, title, fan_count, is_in_analytics, url_key, profile_countries, profile_tags, profile_type ) FROM STDIN WITH CSV DELIMITER AS ';' QUOTE AS ''; 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - ERROR {0} /usr/lib/postgresql/9.1/bin/psql: relocation error: /usr/lib/postgresql/9.1/bin/psql: symbol SSL_CIPHER_get_name, version OPENSSL_1.0.0 not defined in file libssl.so.1.0.0 with link time reference 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - ERROR (version 5.0.1, build 1 from 2013-10-30_19-53-32 by buildguy) : Error in step 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - ERROR (version 5.0.1, build 1 from 2013-10-30_19-53-32 by buildguy) : org.pentaho.di.core.exception.KettleException: 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - Error serializing rows of data to the psql command 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - Broken pipe 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at org.pentaho.di.trans.steps.pgbulkloader.PGBulkLoader.writeRowToPostgres(PGBulkLoader.java:450) 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at org.pentaho.di.trans.steps.pgbulkloader.PGBulkLoader.processRow(PGBulkLoader.java:316) 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at org.pentaho.di.trans.step.RunThread.run(RunThread.java:60) 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at java.lang.Thread.run(Unknown Source) 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - Caused by: java.io.IOException: Broken pipe 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at java.io.FileOutputStream.writeBytes(Native Method) 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at java.io.FileOutputStream.write(Unknown Source) 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at java.io.BufferedOutputStream.flushBuffer(Unknown Source) 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at java.io.BufferedOutputStream.write(Unknown Source) 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at java.io.FilterOutputStream.write(Unknown Source) 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - at org.pentaho.di.trans.steps.pgbulkloader.PGBulkLoader.writeRowToPostgres(PGBulkLoader.java:361) 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - ... 3 more 
    2014/03/21 13:14:18 - PostgreSQL Bulk Loader.0 - Finished processing (I=0, O=710, R=711, W=710, U=0, E=1) 
    2014/03/21 13:14:18 - Text file input.0 - Finished processing (I=1396, O=0, R=0, W=1396, U=0, E=0) 
    2014/03/21 13:14:18 - Transformation 3 - Transformation 3 
    2014/03/21 13:14:18 - Transformation 3 - Transformation 3

  5. Mar 24, 2014

    Patrik Lenart says:

    It looks like option with pgpass doesnt work. It works from terminal but doesnt ...

    It looks like option with pgpass doesnt work. It works from terminal but doesnt work from PDI.

  6. Mar 25, 2014

    Patrik Lenart says:

    solution for the Openssl problem with psql client is to use psql client located ...

    solution for the Openssl problem with psql client is to use psql client located inside pentaho install directory.

  7. Jun 04

    Steven C. Buttgereit says:

    .pgpass works well with PDI.  PDI is only calling the command line psql; ps...

    .pgpass works well with PDI.  PDI is only calling the command line psql; psql is responsible (via the lower level libpq) for actually making use of the .pgpass file.  As such there's no reason why .pgpass wouldn't work because of PDI.

    There are a couple of things you need to make sure about when you use .pgpass method and you should read the docs (linked again: http://www.postgresql.org/docs/9.4/static/libpq-pgpass.html) before you move forward with this approach.  Whatever OS user is actually calling the PDI process is the user for whom .pgpass must be created.  The .pgpass file should be in the home directory of that user or in the directory pointed to by the PGPASSFILE environment variable.  The permissions must be correct (Windows excepted): the idea is that if anyone can read the .pgpass file other than the user which will use it, the credentials are not secure and therefore not to be trusted; you should be sure the user itself is secure.  On Linux this is read/write only by the owner, not group or world (600 as mentioned earlier).  Naturally the file ownership must be correct as well.

    If your security needs require you to take a defense in depth position, DO NOT use the method described in the documentation.  The approach documented here will work and is the easiest to get working,  but you'll be compromising security measures that could otherwise be useful.  You don't want to just use 'all' for database or 'all' for users, be specific.  Also I would recommend if you're using 'trust' to only trust whitelisted IP addresses rather than entire subnets.  Remember trust means no database authentication so use by intent and not just because some document on the internet suggested it.

    Finally, while .pgpass is an OK method in many circumstances (it's still just an MD5 hash), there are other/better authentication methods for PostgreSQL all of which should work for the same reason that .pgpass should work described above.  For documentation on the other options: (http://www.postgresql.org/docs/9.4/static/auth-methods.html).

This documentation is maintained by the Pentaho community, and members are encouraged to create new pages in the appropriate spaces, or edit existing pages that need to be corrected or updated.

Please do not leave comments on Wiki pages asking for help. They will be deleted. Use the forums instead.

Adaptavist Theme Builder (4.2.0) Powered by Atlassian Confluence 3.3.3, the Enterprise Wiki