Hitachi Vantara Pentaho Community Wiki
Access Keys:
Skip to content (Access Key - 0)

PLEASE NOTE: This tutorial is for a pre-5.0 version PDI. If you are using PDI 5.0 or later, please use the following tutorial instead: Getting Started with PDI.

Hello World Example

Although this will be a simple example, it will introduce you to some of the fundamentals of PDI:

  • Working with the Spoon tool
  • Transformations
  • Steps and Hops
  • Predefined variables
  • Previewing and Executing from Spoon
  • Executing Transformations from a terminal window with the Pan tool.

Overview

Let's suppose that you have a CSV file containing a list of people, and want to create an XML file containing greetings for each of them.

CSV File Contents:

last_name,name
Suarez,Maria
Guimaraes,Joao
Rush,Jennifer
Ortiz,Camila
Rodriguez,Carmen
da Silva,Zoe

Desired Output:

<Rows>
  <row>
    <msg>Hello, Maria!</msg>
  </row>
  <row>
    <msg>Hello, Joao!</msg>
  </row>
  <row>
    <msg>Hello, Jennifer!</msg>
  </row>
  <row>
    <msg>Hello, Camila!</msg>
  </row>
  <row>
    <msg>Hello, Carmen!</msg>
  </row>
  <row>
    <msg>Hello, Zoe!</msg>
  </row>
</Rows>

A Transformation is made of Steps, linked by Hops. These Steps and Hops form paths through which data flows.

Preparing the environment

Before starting a Transformation, create a Tutorial folder, where you'll save all the files for this tutorial. Then create a CSV file like the one shown above, and save it in the Tutorial folder as list.csv.

Transformation walkthrough

The proposed task will be accomplished in three subtasks:

  1. Creating a new Transformation
  2. Designing the basic flow of the transformation, by adding steps and hops
  3. Configuring the steps for the dataset and the desired actions

Creating a new Transformation

  1. Click New, then select Transformation. Alternatively you can go to the File menu, then select New, then Transformation. You can also just press Ctrl-N.
  2. Click Save, and save it into the Tutorial folder with the name hello.  The transformation will be stored as a hello.ktr file.

Designing the basic flow of the transformation, by adding steps and hops

A Step is the minimal unit inside a Transformation. A wide variety of Steps are available, grouped into categories like Input and Output, among others. Each Step is designed to accomplish a specific function, such as generating a random number or inserting rows into a database table.

A Hop is a graphical representation of data flowing between two Steps, with an origin and a destination. The data that flows through that Hop constitutes the Output Data of the origin Step, and the Input Data of the destination Step. A Hop has only one origin and one destination, but more than one Hop could leave a Step. When that happens, the Output Data could be distributed among the outgoing hops, or copied entirely to each outgoing hop. Likewise, more than one Hop can reach a Step. In those instances, the Step has to have the ability to merge the Input from the different Steps in order to create the Output.

Our Transformation has to do the following:

  1. Read the CSV file
  2. Build the greetings message
  3. Save the greetings in the XML file

For each of these items you'll use a different Step, according to the next diagram:

In this example, each task will be done in a single step, due to the simplicity of the requirements. For more complex transformations, it may take many more steps to achieve the desired result.

Here's how to start the Transformation:

  1. To the left of the workspace is the Steps Palette. Select the Input category.
  2. Drag the CSV file onto the workspace on the right.
  3. Select the Scripting category.
  4. Drag the Modified JavaScript Value icon to the workspace.
  5. Select the Output category.
  6. Drag the XML Output icon to the workspace.

Now you will link the CSV file input with the Modified Java Script Value by creating a Hop:

  1. Select the first Step.
  2. Hold the Shift key and drag the icon onto the second Step.
  3. Link the Modified Java Script Value with the XML Output via this same process.

Specifying Step behavior

Every Step has a configuration window. These windows vary according to the functionality of the Steps and the category to which they belong. The Step Name can be set within the configuration window, making it easier to understand what each step will do. A Step Description is also available allows you to clarify the purpose of the Step for documentation purposes.

Configuring the CSV file input Step

  1. Double-click on the CSV file input Step.
  2. The configuration window for the step will appear. Here you'll indicate the file location, file format (e.g. delimiters, enclosure characters, etc.) and column metadata (e.g. column name, data type, etc)
  3. Change the step name with one that is more representative of this Step's function. In this case, type in name list.
  4. For the Filename field, click Browse and select the input file.

    Note: Just to the right of the text box is a symbol with a red dollar sign. This means that you can use variables as well as plain text in that field. A variable can be written manually as ${name_of_the_variable} or selected from the variable window, which you can access by pressing Ctrl-Spacebar. This window shows both predefined and user-defined variables, but since you haven't created any variables yet, right now you'll only see the predefined ones. Among those, select:

    ${Internal.Transformation.Filename.Directory}
    

    Next the name of the variable, type a slash and the name of the file you created:

    ${Internal.Transformation.Filename.Directory}/list.csv
    

    At runtime the variable will be replaced by its value, which will be the path where the Transformation was saved. The Transformation will search the file
    list.csv in that location.

  5. Click Get Fields to add the list of column names of the input file to the grid. By default, the Step assumes that the file has headers (the Header row present checkbox is checked).

    Note: The Get Fields button is present in most Steps' configuration windows. Its purpose is to load a grid with data from external sources or previous Steps. Even when the fields can be written manually, this button gives you a shortcut when there are many available fields and you want to use all or almost all of them.

  6. The grid has now the names of the columns of your file: last_name and name, and should look like this:
  7. Switch lazy conversion off
  8. Click Preview to ensure that the file will be read as expected. A window showing data from the file will appear.
  9. Click OK to finish defining the Step CSV file input.

Configuring the Modified JavaScript Value Step

  1. Double-click on the Modified JavaScript Value Step.
  2. The Step configuration window will appear. This is different from the previous Step config window in that it allows you to write JavaScript code. You will use it to build the message "Hello, " concatenated with each of the names.
  3. Name this Step Greetings.
  4. The main area of the configuration window is for coding. To the left, there is a tree with a set of available functions that you can use in the code. In particular, the last two branches have the input and output fields, ready to use in the code. In this example there are two fields: last_name and name. Write the following code:
    var msg = 'Hello, ' + name + "!";
    
  5. At the bottom you can type any variable created in the code. In this case, you have created a variable named msg. Since you need to send this message to the output file, you have to write the variable name in the grid. This should be the result:

Warning: Don't mix these variables with PDI variables - they are not the same.

  1. Click OK to finish configuring the Modified Java Script Value step.
  2. Select the Step you just configured. In order to check that the new field will leave this Step, you will now see the Input and Output Fields. Input Fields are the data columns that reach a Step. Output Fields are the data columns that leave a Step. There are Steps that simply transform the input data. In this case, the input and output fields are usually the same. There are Steps, however, that add fields to the Output - Calculator, for example. There are other Steps that filter or combine data causing that the Output has less fields that the Input - Group by, for example.
  3. Right-click the Step to bring up a context menu.
  4. Select Show Input Fields. You'll see that the Input Fields are last_name and name, which come from the CSV file input Step.
  5. Select Show Output Fields. You'll see that not only do you have the existing fields, but also the new msg field.

Configuring the XML Output Step

  1. Double-click the XML Output Step. The configuration window for this kind of Step will appear. Here you're going to set the name and location of the output file, and establish which of the fields you want to include. You may include all or some of the fields that reach the Step.
  2. Name the Step File with Greetings.
  3. In the File box write:
    ${Internal.Transformation.Filename.Directory}/Hello.xml
    
  4. Click Get Fields to fill the grid with the three input fields. In the output file you only want to include the message, so delete name and last_name.
  5. Save the Transformation again.

How does it work?

When you execute a Transformation, all steps are executed simultaneously. The Transformation executes asynchronously; the rows of data flow through the steps at their own pace. Each processed row flows to the next step as soon as that row is processed by the current step.

At this point, Hello World is almost completely configured. A Transformation reads the input file, then creates messages for each row via the JavaScript code, and then the message is sent to the output file. This is a small example with very few rows of names, so it is difficult to notice the asynchronous execution in action. Keep in mind, however, that it's possible that at the same time a name is being written in the output file, another is leaving the first Step of the Transformation.

Verify, preview and execute

  1. Before executing the Transformation, check that everything is properly configured by clicking Verify. Spoon will verify that the Transformation is syntactically correct, and look for unreachable Steps and nonexistent connections. If everything is in order (it should be if you followed the instructions), you are ready to preview the output.
  2. Select the JavaScript Step and then click Preview button. The following window will appear:
  3. As you can see, Spoon suggests that you preview the selected Step. Click Quick Launch. After that, you will see a window with a sample of the output of the JavaScript Step. If the output is what you expected, you're ready to execute the Transformation.
  4. Click Run.
  5. Spoon will show a window where you can set, among other information, the parameters for the execution and the logging level. Click Launch.
  6. An Execution Results pane will appear at the bottom, allowing you to view step performance metrics and log messages.

In the Step Metrics tab, you can see the executed operations for each Step of the Transformation. In particular, pay attention to these:

  • Read: the number of rows coming from previous Steps.
  • Written: the number of rows leaving from this Step toward the next.
  • Input: the number of rows read from a file or table.
  • Output: the number of rows written to a file or table.
  • Errors: errors in the execution. If there are errors, the whole row will become red.

In the Logging tab, you will see the execution step by step. The detail will depend on the log level established. If you pay attention to this detail, you will see the asynchronicity of the execution. The last line of the text will be:

Spoon - The transformation has finished!!

If there weren't error messages in the text, open the newly generated Hello.xml file and check its content.

Pan

Pan allows you to execute Transformations from a terminal window. The script is Pan.bat on Windows, or pan.sh on other platforms, and it's located in the installation folder. If you run the script without any options, you'll see a description pan with a list of available options.

To execute your Transformation, try the simplest command:

Pan /file <Tutorial_folder_path>/Hello.ktr /norep
  • /norep is a command to ask Spoon not to connect to the repository.
  • /file precedes the name of the file that contains the Transformation.
  • <Tutorial_folder_path> is the full path to the Tutorial folder, for example:
C:/Pentaho/Tutorial

or

/home/PentahoUser/Tutorial

The other options are run with default values.

After you enter this command, the Transformation will be executed in the same way it did inside Spoon. In this case, the log will be written to the terminal unless you specify a file to write to. The format of the log text will vary a little, but the information will be basically the same that you saw in the graphical environment.


  1. Jul 29, 2008

    Anonymous says:

    When following this tutorial, I get the following errors when doing Verify after...

    When following this tutorial, I get the following errors when doing Verify after hooking everything up:

    [4 - Error] Modified Java Script Value
      Couldn't add Input fields to Script! Error:
    java.lang.RuntimeException: Unable to verify if [last_name String(9)<binary-string>] is null or not because of an error:java.lang.ClassCastException: java.lang.String cannot be cast to [B

    [4 - Error] Modified Java Script Value
      General error executing script:
    org.mozilla.javascript.EcmaError: ReferenceError: "name" is not defined. (script#3)

    If I run the transformation, however, it works as expected.  I'm using pdi-open-3.0.4-GA

  2. Jul 29, 2008

    Anonymous says:

     I was also getting exactly the same errors posted after following the tuto...

     I was also getting exactly the same errors posted after following the tutorial but everything worked right when executing the transformation. I have Kettle-3.0.3.GA-0569 installed.

  3. Aug 03, 2008

    Anonymous says:

    Hiya , same here , I mean,  getting error as stated above but the transform...

    Hiya , same here , I mean,  getting error as stated above but the transformation runs fine and output got as expected ...

     thanks for the wonderful work of creating this tutorial .

  4. Aug 04, 2008

    Maria Carina Roldan says:

    I think that it's a bug related with the "csv input step". Either: 1. Replace t...

    I think that it's a bug related with the "csv input step". Either:

    1. Replace that step with a "text file input"

    or 

    2. Switch lazy conversion off in csv input (thanks to sboden for this tip)

    and you should be OK.

    Thanks for the "wonderful work" part

    mc

  5. Sep 11, 2008

    Anonymous says:

     Thanks for you good work!  I face the same error, but when I tur...

     Thanks for you good work!

     I face the same error, but when I turn off the compatibility mode , that's right

  6. Jun 16, 2009

    Frank Milbona says:

    Thank you very much for this tutorial. It is still very helpful although the use...

    Thank you very much for this tutorial. It is still very helpful although the user interface has already changed quite a bit.

    You said that the steps of a transformation run concurrently. I wonder if this is also true for the steps of a job. Will they also run concurrently or will the next step in a job only execute after the previous one has completed?

    ... Sorry, the answer is on the next page. Job entries are executed in sequence.

  7. Feb 13, 2012

    Kenneth Freidank says:

    (v4.2.1) COMMENTS - TRANSFORMATION WALKTHROUGH Creating the Transformation Cl...

    (v4.2.1) COMMENTS - TRANSFORMATION WALKTHROUGH

    Creating the Transformation

    • Close the welcome screen before step 1.
    • Right-click "Transformation 1" to get to 'Settings', or left-click select "Transformation 1" and click "Edit/Settings" from the menu.
    • After closing the settings window, the tree layout on the left hand of the screen needs to be refreshed before the new name appears in place of "Transformation 1".  You can refresh the tree by clicking on the "Design" tab, then clicking back on the "View" tab.  After saving the transfroamtion, the tab on the right hand side of the screen refreshes from the old name, "Transformation 1", the your new name.

    Construction the Skeleton....

    • Here's how to start the transform....
      • Click the "Design" tab on the left to get to the pallette.
      • After dragging the "csv" to the right, collapse the "Input" category that you selected on the left, then you can see the "Scripting" category.  Collapse each categroy after its use, so you can see the other categories.  Repeated clicks on the triangle icon to the left of the category collapses and expands the category.
      • After dragging the "csv file input" step on top of the "Modified Java Script Value" step, a blue arrow appeared representing the hop, and a pop-up menu appeared.  If I did not left-click the "Main output of step" in the pop-up menu, the hop did not stay, but was deleted.  So, make sure you click the "Main output of step".  The blue arrow changes to black when the hop is accepted.
    • Configuring the CSV file...
      • Uncheck the "lazy conversion" check box, as others have posted.  Don't know what this does, but it removes an error message you get later on.
    • Configuring the Modified Javascript...
      • Click the "Compatibility mode" check box, as the other posters have pointed out, so that you will see the ".getString()" part show up in the Input fields list.
    • Configuring the XML output...
      • There is a separate text box that holds the file extension.  This extension is appended to the filename.  There is a handy "show filename" button that you can use to check if you've got the full path and correct filename that will actually be used.
      • Step 4 says to click "Get Fields".  Click the "Fields" tab at the top of the window before "Get fields".
      • To delete the name and last_name, I first clicked the number of the field on the left of the grid.  This selected the entire row.
    • Verify, Preview, ....
      • "Verify" is under the "Action" menu item.  There is also a "verify" icon at the top of the workspace, just under the name of the transform which shows as the tab of the workspace.  The verify icon looks like a sheeto of paper with a green check mark.
      • If you forgot to uncheck the "lazy conversion" check box in the "CSV File Settings", you will get an error when verifying.
      • The "Transform Debug" dialog box does not show the preview options until you click one of the steps on the left of the dialog box.
      • The results of the run are displayed in a new window that opens below the workspace called "Execution Results".  The log is a tab in the new window.  The letters I, O, R, W, U, and E are abreviations for the recorded stats INPUT, OUTPUT, READ, WRITTEN, ERRORS.   I don't know what U means.
  8. Jun 29, 2012

    Brian Wu says:

    @Kenneth - thanks for the 4.2.1 update - veryyy helpful and wish I saw it before...

    @Kenneth - thanks for the 4.2.1 update - veryyy helpful and wish I saw it before I had gone through it myself ;).

    When verifying, I get a "File specifications are not checked." I tried to figure out what the problem was for a while, but the transformation ended up running without errors, so I've stopped looking into it.

  9. Jul 11, 2013

    David Lopez says:

    @Kenneth Thanks so much for the comment. Right now I am working with Kettle Sp...

    @Kenneth

    Thanks so much for the comment.

    Right now I am working with Kettle Spoon 4.4.1 - GA and, concerning the Execution step, the Execution Results window displayed below holds four tabs:

    Execution History - Logging - Step Metrics - Performance Graph

    The one that contains INPUT, OUTPUT, READ, WRITTEN, ERRORS is Step Metrics, which displays all these data regarding each step; everything is displayed in a grid, where each row is a Step and each column one of these attributes, By the way, U is UPDATED :-)

  10. Dec 20, 2013

    Randy Sinurat says:

    Everytime i set the output xml step, i always get this error: "No enum const c...

    Everytime i set the output xml step, i always get this error:

    "No enum const class org.pentaho.di.trans.steps.xmloutput.XMLField$ContentType."

    After that, i cannot open the step configuration window.

    I am using kettle version 5.0.1.

    What actually happened??? I am newbie, by the way.

    Thank you.

    NOTE :

    I already know the problem. It's the kettle version.

    Kettle 5.0.1 won't do the output work on me. I think it is not stable yet. CMIIW.

    Thank you.

  11. Dec 20, 2013

    Marvin Horst says:

    Randy I just finished the hello world tutorial and I also got this error. The p...

    Randy

    I just finished the hello world tutorial and I also got this error. The problem was that I didn't have a Content Type defined in the fields tab of the XML output command. After defining a Content type the error went away

  12. Dec 30, 2013

    Randy Sinurat says:

    Dear Marvin, I tried your advice, and it's worked. I defined the content type....

    Dear Marvin,

    I tried your advice, and it's worked.

    I defined the content type. I chose element. 

    What i don't know is the difference between element and attribute content type.

    Thank you.

This documentation is maintained by the Pentaho community, and members are encouraged to create new pages in the appropriate spaces, or edit existing pages that need to be corrected or updated.

Please do not leave comments on Wiki pages asking for help. They will be deleted. Use the forums instead.

Adaptavist Theme Builder (4.2.0) Powered by Atlassian Confluence 3.3.3, the Enterprise Wiki