Introduction
The PostgreSQL bulk loader is an experimental step in which we will to stream data from inside Kettle to the psql command using "COPY DATA FROM STDIN" into the database.
This way of loading data offers the best of both worlds : the performance of a bulk load and the flexibility of a Pentaho Data Integration transformation.
Make sure to check out the "Set up authentication" section below!
Note: This step does not work with a JNDI defined connection, only JDBC is supported.
Options
| Option | Description |
|---|---|
| Step name | Name of the step.
|
| Connection | Name of the database connection on which the target table resides. Note: The password of this database connection is not used, see below in the "Set up authentication" section! Since PDI-1901 is fixed in 3.2.3, the username of the connection is used and added to the -U parameter, otherwise the logged in user acount would be taken. |
| Target schema | The name of the Schema for the table to write data to. This is important for data sources that allow for table names with dots '.' in it. |
| Target table | Name of the target table. |
| psql path | Full path to the psql utility. If psql is in the path of the executing application you can leave it to simply psql. |
| Load action | Insert, Truncate. Insert inserts, truncate first truncates the table. |
Set up authentication
"psql" doesn't allow you to specify the password. Here is a part of the connection options:
Connection options: -h HOSTNAME database server host or socket directory (default: "/var/run/postgresql") -p PORT database server port (default: "5432") -U NAME database user name (default: "matt" - if you are not Matt: Since PDI 3.2.3 the username of the connection is taken, see PDI-1901.) -W prompt for password (should happen automatically)
As you can see there is no way to specify a password for the database. It will always prompt for a password on the console no matter what.
To overcome this you need to set up trusted authentication on the PostgreSQL server.
To make this happen, change the pg_hba.conf file (on my box this is /etc/postgresql/8.2/main/pg_hba.conf) and add a line like this:
host all all 192.168.1.0/24 trust
This basically means that everyone from the 192.168.1.0 network (mask 255.255.255.0) can log into postgres on all databases with any username. If you are running Kettle on the same server, change it to localhost:
host all all 127.0.0.1/32 trust
This is much safer of-course. Make sure you don't invite any strangers onto your PostgreSQL database!
TIP! Make sure to restart your database server after you made this change
For authentication, you can also specify it in .pgpass (or pgpass.conf for windows) configuration file.
An entry may look like this :
my_server_name:5432:*:my_user_name:my_secret_password.
No need to risk opening your whole database server with passwordless login as advised.
The file has to be 600 on Unix.
Here is the doc :
It shouldn't be necessary to fully restart the database after pg_hba.conf changes. Reload usually does it, too.
/etc/init.d/postgresql reload