Formulas have multiple uses in Pentaho Metadata.

=20The first use of formulas within Pentaho Metadata is in the constraint d= efinition of a Metadata Query, also known as MQL. A constraint functi= on references business table columns and uses various comparison operators = to determine which subset of data the business user is interested in.

= =20The second use is in the definition of Physical Table Columns. In = addition to Physical table columns mapping directly to a database table col= umn, physical table columns defined in Pentaho Metadata may also be defined= as a formula. This allows for combining of multiple columns into a single = column, and also for doing more advanced aggregate calculations within aggr= egate table definitions.

=20The third use is in the definition of complex joins within business mode= l relationships. This allows for multiple key joins as well as other = logic when joining tables.

=20The fourth use is row level security.

=20Under the covers, Pentaho Metadata uses JFreeReport's libFormula package= for interpreting formulas. The goal is to support OpenFormula syntax= within the Metadata environment. Formulas are first interpreted by l= ibFormula, and then within the Metadata system are converted to native SQL = depending on the type of database used.

=20Here is an example of an MQL Constraint formula:

=20=20

OR([BT_CUSTOMERS.BC_CUSTOMERS_CUSTOMERNAME] =3D "EuroCars"; (([BT_CUSTOMERS= .BC_CUSTOMERS_CREDITLIMIT] * 2) / 3 > 1000))=20

We'll walk through this example to help explain the core components of M= QL formulas. First note the OR function. This is a boolean func= tion which has two parameters, separated by semi-colons. These parame= ters are boolean expressions.

=20The first boolean expression first references a business column from our= Metadata model. All references appear with brackets around them [].= This reference first refers to the business table, and then to the b= usiness column. This boolean expression first does some arithme= tic and checks to see if the final value us larger than 1000.

=20In the second expression, we compare the business column BT_CUSTOMERS.BC= _CUSTOMERS_CUSTOMERNAME to EuroCars. Note that we use double quotes w= hen referring to text. Double quotes are required.

=20Here is an example of a Physical Table Column Formula:

=20=20

SUM([QUANTITYORDERED]*[PRICEEACH])=20

Aggregate functions are supported within physical table column formula d= efinitions. The references here specifically refer to the database co= lumn, not derived physical column definitions. All operators and func= tions may be used in the definition of the physical table column. One= special note, in order for this formula to be recognized, the "isExact" pr= operty of the physical table column must be set to true. Also note, the ref= erenced physical column must be explicitly defined in the metadata model. <= br class=3D"atl-forced-newline">

=20Since the latest versions (after 2008/03/14) it is possible to define fo= rmulas that use business columns from anywhere in the business model.

= =20For example suppose we have 2 business tables:

=20- =20
- Orders (fact table), ID=3DBT_ORDER_FACT =20
- Product (dimension), ID=3DBT_PRODUCT =20

Suppose we want to calculate the turnover based on:

=20- =20
- the number of products sold, from the Orders table, ID=3DBC_FACT_ORDER_= NRPRODUCTS =20
- the price of the product, from the Product table, ID=3DBC_DIM_PRODUCT_P= RICE =20

To arrive there, we define a new business column, say in the Orders busi= ness table (although you could take Product too):

=20- =20
- Table: Orders (BT_ORDER_FACT) =20
- ID =3D BC_FACT_ORDER_TURNOVER =20
- Name =3D Turnover =20
- Formula =3D SUM( [BT_ORDER_FACT.BC_FACT_ORDER_NRPRODUCTS] * [BT_PRODUCT= .BC_DIM_PRODUCT_PRICE] ) =20
- Exact =3D Yes =20

The SQL generator is now going to replace the 2 business columns by thei=
r respective SQL variants. As such, we have to make sure that the bus=
iness columns on which we base ourselves are resolving correctly. In this s=
pecific case, this means we want the 2 columns to be non-aggregated. =
If we now select the single business column BT_FACT_ORDER_TURNOVER, this is=
the SQL that is generated:

=20

SELECT SUM( BT_ORDER_FACT.NRPRODUCTS * BT_PRODUCT.PRICE ) AS COL0 FROM FACT_ORDER BT_ORDER_FACT ,DIM_PRODUCT BT_PRODUCT WHERE ( BT_ORDER_FACT.PRODUCT_TK =3D BT_PRODUCT.PRODUCT_TK )=20

Now, suppose we want to generate the multiplication of the 2 sums (diffe= rent use-case), we define the formula as "[BT_ORDER_FACT.BC_FACT_ORDE= R_NRPRODUCTS] * [BT_PRODUCT.BC_DIM_PRODUCT_PRICE]" (without the SUM) and sp= ecify an aggregation for the 2 used business columns. The generated S= QL will then be:

=20=20

SELECT SUM( BT_ORDER_FACT.NRPRODUCTS ) * SUM( BT_PRODUCT.PRICE ) AS C= OL0 FROM FACT_ORDER BT_ORDER_FACT ,DIM_PRODUCT BT_PRODUCT WHERE ( BT_ORDER_FACT.PRODUCT_TK =3D BT_PRODUCT.PRODUCT_TK )=20

It is obviously possible to create 2 versions of the used business colum= ns, one aggregated (exposed to the users) and one non-aggregated (hidden fr= om the users) for example.

=20The SQL generator works recursively. That means that it is possibl= e to create a formula that calculates 7% (taxes for example) of the turnove= r:

=20- =20
- ID =3D BC_FACT_ORDER_TURNOVER_TAXES =20
- Name =3D Turnover Taxes =20
- Formula =3D [BT_ORDER_FACT.BC_FACT_ORDER_TURNOVER] * 7 / 100 =20
- Exact =3D Yes =20

If we add that column to the selection, we get one extra column li= ke this:

=20=20

( SUM( BT_ORDER_FACT.NRPRODUCTS * BT_PRODUCT.PRICE ) * 7 / 100) AS COL1=20

Function syntax

=20=20

FUNCTION_NAME ( PARAM ; PARAM )=20

Text (requires double quotes)

=20=20

"TEXT"=20

Parenthesis are used for formula precedence:

=20=20

( 1 + 2) * 3=20

Business Column References:

=20=20

[<BUSINESS_TABLE_ID>.<BUSINESS_COLUMN_ID>]=20

Physical Column References (only used in physical column formula definit= ons):

=20=20

[<PHYSICAL_COLUMN_NAME>]=20

=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20

=20
Function Name | Parameters | Description | Example |
---|---|---|---|

OR | 2 or more boolean expression parameters = | Returns true if one or more parameters are t= rue | OR( |

AND | 2 or more boolean expression parameters = | Returns true if all parameters are true = | AND( |

LIKE | 2 parameters | Compares a column to a regular expression, u= sing "%" as wild cards | LIKE([BT_CUSTOMERS.BC_CUSTOMERS_CUSTOMERNAME= ]; "%SMITH%") |

IN | 2 or more parameters | Checks to see if the first parameter is in t= he following list of parameters | IN([BT_CUSTOMERS.BC_CUSTOMERS_CUSTOMERNAME];= "Adam Smith"; "Brian Jones") |

NOW | none | The current date | NOW() |

DATE | 3 numeric parameters, year, month, and day <= /p> | A specified date | DATE(2008;4;15) |

DATEVALUE | 1 text parameter "year-month-day" | A specified date | DATEVALUE("2008-04-15") |

CASE | 2 or more parameters | Evaluates the first, third, etc parameter, a=
nd returns the second, fourth, etc parameter value | CASE( |

COALESCE | 1 or more parameters | returns the first non null parameter =20 | COALESCE( |

ISNA | 1 parameter | returns true if 1st parameter is null. Other= wise, false. | ISNA([BT_CUSTOMERS.BC_CUSTOMERS_LAST_LOGIN])= |

- =20
- see below for aggregate functions =20

=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20

=20
Operator | Description |
---|---|

=3D | returns true if two expressions are equal |

> | returns true if first expression is larger t= han the second |

< | returns true if first expression is smaller = than the second |

>=3D | returns true if first expression is larger t= han or equal to the second |

<=3D | returns true if first expression is smaller = than or equal to the second |

<> | returns true if two expressions are not equa= l |

+ | adds two values |

- | subtracts two values |

* | multiplies two values |

/ | divides two values |

Aggregate functions may only be used in physical column definitions.

= =20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20

Function Name | Description |
---|---|

SUM | sums a specific columns values determined by= grouping |

COUNT | counts a specific columns values determined = by grouping |

AVG | averages a specific columns values determine= d by grouping |

MIN | selects the minimum column value determined = by grouping |

MAX | selects the maximum column value determined = by grouping |