Access Keys:
Skip to content (Access Key - 0)

Introduction

In version 3.0 of Pentaho Data Integration we allow you to execute a job you defined on a (remote) slave server.  This page explains how to set this up and what to look out for.

Basics

In a typical ETL setup, you might come across the need to fire off a job on a remote server somewhere.  Version 3.0 of PDI is the first one to allow this.
Here is how you do it: you define a Slave Server in your job and specify the slave server on which you want to run the job on like this:
 

Possible problems

Referenced objects are not found

One of the first problems you will encounter is that ONLY the job itself is sent to the slave server for execution, not the referenced files, transformation, mappings, sub-jobs, etc.

SOLUTION 1: specify the correct files as they exist on the remote system.  It might be tricky to test your jobs and transformations locally.

SOLUTION 2: Use a shared drive that is mapped locally and remotely so that you can make changes to the transformations and jobs easily. The drawback is that it's usually a hassle to set this up or to make sure the referenced paths are exactly the same on the remote system. (drive letters, mount paths, etc).  If you do use this system, we make it easy for you because the referenced path of the local job file is send to the slave server for reference.

SOLUTION 3: Put all the referenced files into a zip file, let's call it "bigjob.zip".  Reference all files relative to on another using variables like ${Internal.Job.Filename.Directory} or ${Internal.Transformation.Filename.Directory}

That way you just have to transfer the file to the remote server (FTP, SFTP, etc).  You can specify the location of the root job like this: 

zip:file:///tmp/bigjob.zip!/MainJob.kjb

Note that the zip file can be located anywhere, even on a web server.  The URL for the root job then simply changes to:

zip:http://ww.foo.com/bar/bigjob.zip!/MainJob.kjb

Repository access is not available on slave server

Because the repository login credentials where not yet sent over "the wire" we could not offer a solution for this problem in version 2.5.x.
Version 3.0.0 fixes this issue by passing the repository name, user name and password to the slave server.

Please note that the slave server has to be aware of the location of the repository.  Normally, the file $HOME/.kettle/repositories.xml is referenced for this.  However, you can also copy this file into the directory in which you started Carte (the local directory).

Even if you set this up correctly, verify that the slave server can reach the repository database over the network and that the repository database contains permissions for the slave server to access it.


  1. Dec 18, 2007

    Austin says:

    Hi Matt, I think something should be added about the <kettle install dir>...

    Hi Matt,

    I think something should be added about the <kettle install dir>/pwd/kettle.pwd file, and the plain text storage of user/password, how best you understand it. This was a major gotcha for me. I didn't find anything in the manual about placing a username and password in that file when I was making various searches for an answer to the "AUTH FAILURE: user xxxxx" problem.

    In case someone stumbles on this comment:

    If you are running carte.sh on your slave server and printing errors to stdout (terminal) and you see something like:

    INFO 18-12 16:45:05,766 (LogWriter.java:println:403) -org.pentaho.di.www.WebServer@20f8395f -
    Created listener for webserver @ address : 10.40.30.208:8081
    2007-12-18 16:45:05.767::INFO: jetty-6.1.2rc1
    2007-12-18 16:45:05.816::INFO: Started SocketConnector @ 10.40.30.208:8081
    2007-12-18 16:45:22.864::WARN: AUTH FAILURE: user <your_username>
    2007-12-18 16:45:24.277::WARN: AUTH FAILURE: user <your_username>

    Then check that you have your username and password for the slave host in the <kettle>/pwd/kettle.pwd directory on the host that is executing the job.

    If the file doesn't exist, create it in <kettle>/pwd/kettle.pwd with the format of:

    joe: pass_the_gravy
    bob: pass_the_car

    Thanks for the great work and effort. PDI is working out great for me.

    Austin

  2. Nov 03, 2009

    Martin van Dijken says:

    No update yet on making it easier to execute files remotely? We are using a shar...

    No update yet on making it easier to execute files remotely? We are using a shared drive that gets mapped in Windows to K:/. We use an environment variable to set what the base path of all transformations is.

    So on windows:

    BASE_PATH=K:/ 
    

    On the remote server:

    BASE_PATH=/data/share/kettle/martin
    

    The problem is that Spoon wants to send the environment variables over to Carte. So every time you run something, I have to change BASE_PATH to the remote directory. Doing that manually every time you want to start a job is completely pointless, is there a better way?

This documentation is maintained by the Pentaho community, and members are encouraged to create new pages in the appropriate spaces, or edit existing pages that need to be corrected or updated.

Please do not leave comments on Wiki pages asking for help. They will be deleted. Use the forums instead.

Adaptavist Theme Builder (4.2.0) Powered by Atlassian Confluence 3.3.3, the Enterprise Wiki