Hitachi Vantara Pentaho Community Wiki
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 21 Next »


This document describes the architecture of the Pentaho BI Platform and details the components and tools required to build solutions.  It is intended for people interested in building solutions and creating content.  It is also valuable for anyone who needs to interface with or develop portions of the Pentaho BI Platform.

To get the most out of this documentation, it is recommended that you have the Pentaho BI Suite (Pre-Configured Installation) and Pentaho Design Studio installed on your local machine.  Many of the examples in this document refer to the sample solutions and data that come with the Pentaho BI Suite.  Either version of the Pentaho BI Suite, Open and Pro, will work.  Many examples are illustrated by using the Design Studio to modify the existing samples.

If you already have a working Pentaho Server and functioning Design Studio, you can skip the first section.

dynamictasklist: task list macros declared inside wiki-markup macros are not supported

The Design Studio

The Pentaho Design Studio provides a graphical environment for building, managing, and testing your solution repository. It provides a collection of templates, editors and wizards to help create and maintain your solution repository.

Running the Design Studio

At this point, you have either installed the standalone Design Studio or installed the Design Studio plug-in into Eclipse, and you have a working install of the Pentaho samples. The samples have been tested and work within your browser. Your Pentaho BI Server is running and waiting for requests.

If you haven't done so already, start the Design Studio as described under installation. If the welcome screen appears close it by clicking on the X next to Welcome.

The first thing we need to do is hook up the Design Studio to the samples solution. Select File->New->Project. Then select Simple from the New Project wizard and press the Next> button. Enter "Pentaho Solutions" as the project name. Although any name is fine, this document will refer to Pentaho Solutions, uncheck the Use default checkbox, and browse to the pentaho-solutions directory. If you are using the PCI, this will be /pentaho-demo/pentaho-solutions. Select Finish and you are ready to go.

Browsing the Solution Repository

You should now see your Pentaho Solutions project displayed in the tree on the left side of the Design Studio. If you expand the solution folder you'll see plenty of files. These are the files that make up your solution and are managed with the Design Studio. Let's take a look at one to get a feel of what the Design Studio can do for us. Go ahead and in the left hand tree, open the Pentaho Solutions/samples/getting-started folder. Double-click on the HelloWorld.xaction file and the Action Sequence editor will open in the edit pane.

Action Sequences

The Action Sequence is an XML document that defines the smallest complete task that the solution engine can perform. It is executed by a very lightweight process flow engine and defines the order of execution of one or more the components of the Pentaho BI Platform. We avoid calling this a process flow because it is missing many of the capabilities of a true process flow engine. It is good for sequencing small, linear, success oriented tasks like reporting and bursting. It has the ability to loop through a result set, call another Action Sequence and conditionally execute components. The Action Sequence document should have a ".xaction" suffix.

Introducing the Action Sequence Editor

The Action Sequence editor has four tabs along the bottom: General, Define Process, XML Source and Test. The function of each tab will be discussed in more detail later, their basic functions are:

    • 1. General* - Basic properties like title, help etc.
    • 2. Define Process* - Defines the inputs, outputs, resources required by the Action Sequence and allows you to program the interactions between the Action Sequence and the Pentaho Components
    • XML Source* - The raw XML that the editor is generating
    • Test* - Interface for executing the Action Sequence on the Pentaho BI Server

Click through each tab to get familiar with the editor. Check out the XML Source tab to get an idea of what the editor is saving you from. Now let's look a bit more closely at the HelloWorld.xaction.

General Information

As we mentioned earlier the "General" tab contains some general information about the action sequence, such as the title, author, icon, description, and help text to be displayed in the browser window (as shown below). Notice that the design studio shows the title for action sequence to be "%title". The "%" indicates that this is the name of a string whose value is defined in a properties file with the same name is the same as the action sequence. In this case the property file is named HelloWorld.properties. This is how action sequences accommodate internationalization. . Additionally you can indicate the logging level you would like to use for this action sequence. Logged messages will appear in the pentaho-demo/jboss/server/default/log/server.log file. If you're having problems getting your action sequences working, the log file is a good place to look for clues as to what the problem might be.

Inputs and Resources

Now press the "Define Process" tab. You should see a section labeled "Process Inputs" which lists the inputs and resources used by the action sequence. The inputs are the pieces of information the action sequence will need from the outside world when it runs. They can come from four sources; runtime, request, session, global and default. Runtime parameters are parameters that are stored in the Runtime Context. Remember, the Runtime Context stores the inputs and outputs from previous instances and makes them available to future executions of the same runtime instance id. Request parameters are the name-value pairs specified on a URL. Session parameters are variables that are stored in the user's session and may contain unique values for each user. Global parameters are similar to session parameters except they have the same values for all users. Default values can be specified for each input and in the Action Sequence document and are used as a last resort.

Session and Global parameters can be used to provide secure filtering of data within the Action Sequence. A session parameter gets initialized by executing an action sequence when the user logs onto the system. The Action Sequence called upon login can be set up to perform a query using the user's login name in the where clause. The result is stored in the user's session and is available to subsequent Action Sequences. Global parameters are initialized when the system starts up and are available for all users. See the "Securing Data Access with Session and Global Filters" document for information on how to set up the filters and use them.

There are two implicit inputs instance-id and solution-idthat are always available and do not need to be specified as inputs or outputs. They are the... well I'm sure you can guess what they are.

Resources are the files needed by the action sequence to complete its job. For example: if the action sequence is going to run a JFree report, one of the resources would be the location of the JFree report definition file.

Using the Design Studio let's take a look at some examples of inputs and resources. Browse to the samples/reporting directory in your "Pentaho Solutions" project and double-click on the BIRT-quadrant-budget-for-region-hsql.xaction. Select each of the process inputs to view the details about each of the inputs and resources used by this action sequence.

Outputs

The action sequence outputs are what the action sequence will leave behind when it's complete. Outputs can have three destinations: runtime, session, or content. The first two destinations correspond to the input sources discussed above. The third destination indicates that the output will be put in the http response header or content.

Flow Control

The Action Sequence is not meant to be a replacement for workflow, that being said, there are two ways to control the sequence of execution; loops and conditions. An Action Sequence can execute a group of actions multiple times. The most common usage is to perform the set of actions once for each row in a query result set. The data types that can be specified for a loop are string-list, result-set and property-map-list. Conditional execution can be specified.

A group of actions can also be executed conditionally. The condition that will be evaluated for true is based on a JavaScript result.

Actions (Components)

Within the design studio open the samples/bursting/BurstActionSequence.xaction. What you see in the Process Actions section is a list of all the actions to be performed by this action sequence. Note that the order is important here. The topmost action will be run first, followed by the one below it, and so on. The second action, the one that starts with "Action Loop" probably deserves special mention. It's a loop action that will perform the actions it contains multiple times, depending on what it's set to loop on. In this case it looks like there are five actions contained in the loop. Click on the first action in the list.

On the right side you can view the action details. You'll notice that there is a place for entering a brief description for the action. It's not necessary to enter anything here, but it's a good idea, as it makes the action sequence much easier to read. Each component has its own editor. Since this action uses the SQL query component, there is an area to specify the database connection, the query, and the expected contents of the query result. Now lets click on the "+" sign next to this action in the Process Actions tree. Notice that there are four items listed under the action. These are the outputs from this action. The rule-result output is where the results of the query are stored. The remaining three outputs correspond to particular columns within the rule-result output. Other actions that follow can use these outputs as their inputs. So, one action can leave outputs that following actions can use as inputs. Additionally each action has available to it the action sequence inputs we discussed earlier. The idea is that each little action has something it can do really well. It takes in some input does some work and leaves some output for some other action(s) to use. Your job is to tie these individual actions together to do something meaningful.

Let's now take a quick look at each of the actions in this action sequence and see how they're working together to get something useful done. As we go through this don't get tied up in all the little details. The idea is to get a feel for how the actions work together. Later we'll learn about the details of each individual type of action.

Let's start with the first action in the actions tree. It performs a SQL query to extract some region, manager, and email information from a database. As mentioned earlier, it leaves some outputs behind. The query results are saved in an output called rule-result, and the other three outputs tell the world the column names for information in the results. Anytime you run a SQL query action make sure you include the column names in the output. That way other actions know what data is available in the rule result.

Next is the action loop. If you select it you'll notice that there isn't a whole lot to it. One notable point is that whenever you see "<" and ">" around a string it indicates that a parameter is being referenced. In this case the parameter is "rule-result". It's not by coincidence that this happens to be the name of the output from the preceding SQL Query action. This is an example of an action using the outputs from a previous action. So the five actions within the loop will each be performed once for each row that was returned in the outputs of the previous query action.

The next three actions are all similar. They're each string formatting actions. They take some input strings, place them into a formatted message, and leave the formatted string as an output. Basically they get things in order for the last two actions. If you click on the "+" sign next to each of these actions in the Process Actions tree, you'll see the outputs that each leaves behind.

Click on the action titled "Generate the report". This action will be generating a JFree report. In the configuration section you'll find that the JFree report specification and report format are being referenced using parameter named "report-definition" and "output-type" respectively. Notice both of these parameters are defined under the Process Inputs. Additionally the configuration section contains the database connection and query information that will be passed to the JFree report. Note that since these values are not enclosed within "<>" they are not parameter names, but are constant values. Finally you'll notice that the report is being saved in an output called "report-output". So we've generated this report and it's sitting in an action output parameter called "report-output". Now what?

Select the last action in the sequence. The name says it all. This action will email the report to the manager of the region for which the JFree report was generated. Take a look at the configuration details and you'll see how this action ties all the pieces together to send report off as an attachment.

Again don't be too concerned if you don't understand every detail. Each type of action has its own set of inputs and outputs. Once you get familiar with them you'll soon be putting them together to do all kinds of useful stuff.

Executing an Action Sequence

There are several ways to execute a solution; via Design Studio, URL, Java Code or a Web Service call.

Design Studio

Click on the test tab on the HelloWorld.xaction editor. At the top of the test page, there is a field titled "Pentaho Server URL." If your pentaho server is running, enter the URL to your Pentaho BI Server which is likely to be http://localhost:8080/pentahoif you are running the PCI. Click the "Test Server" button and verify that you see the top level samples page displayed on the test page. Click on "Run" to execute the HelloWorld Action Sequence. You should see the familiar "Hello World. Greetings from the Pentaho BI Platform." message. In the unlikely event that you are not able to not see the Hello World message, make sure the server is running and that you typed the Server URL correctly. Verify that you can run the samples from your browser. If all else fails, try checking the Design Studio forum at www.pentaho.org.

URL

The samples that come with the preconfigured install are launched via URL using the ViewAction (org.pentaho.ui.servlet.ViewAction) servlet. The following URL will launch the HelloWorld Action Sequence:

http://localhost:8080/pentaho/ViewAction?&solution=samples&path=getting-started&action=HelloWorld.xaction

The result returned depends on the Action Sequence Document. You may get a report to view, a text message or just "Action Successful." The following parameters can be entered on the URL:

    • solution, path, action* - The location of the Action Sequence document to load.
    • instance_id* - The instance Id of a previous Runtime Context
    • debug* - set to "true" in order to have debug information written to the execution log.

Web Service Call

In the "Settings and Services" group of the samples that come with the preconfigured install is a Web Service Example. It is still a URL call, this time to the servlet ServiceAction (org.pentaho.ui.servlet.HttpWebService). The following URL will launch the HelloWorld Action Sequence:

http://localhost:8080/pentaho/ServiceAction?solution=samples&path=getting-started&action=HelloWorld.xaction

In this case, the result returned is an XML SOAP Response. The following parameters can be entered on the URL:

    • solution, path, action* - The location of the Action Sequence document to load.
    • instance_id* - The instance Id of a previous Runtime Context
    • debug* - set to "true" in order to have debug information written to the execution log.

Java Call

An Action Sequence can be executed directly from a Java application. For an example of how to do this, open the Java file "org.pentaho.test.RuntimeTest.java" and look at the JUnit test for HelloWorld. This class code can be found by accessing the Pentaho public repository at svn://source.pentaho.org/svnroot.

*h2.
*Action Sequence Recap
The inputs, outputs and resources in the Action Sequence header define a contract between the Action Sequence and the outside world. The Sequence requires the specified inputs and resources to be passed in and will return the specified outputs.

The action-definition defines a contract between each component and the Action Sequence. The action-inputs and action-resources define the parameters that a component requires to execute. The action-outputs define what parameters will be available after the component completes executing. Outputs from one component can be used as inputs to another component. The mapping attribute of the action-inputs allow outputs from one component that have different names to be used as inputs to another component.

Specifying the input/output relationships and their data types allows the system to validate an Action Sequence or set of Action Sequences without actually executing the components. A complete solution can be validated and "locked down" to prevent modification of the Action Sequence documents and eliminate errors due to "broken links" between these documents.


  • No labels