Hitachi Vantara Pentaho Community Wiki
Skip to end of metadata
Go to start of metadata

Formula Syntax Description

Formulas start with a "=" or a formula prefix (any letter and number followed by a colon; the prefix is not interpreted yet, but will be used later to separate different formula-namespaces).

IMPORTANT: this is not the case for the formula defined in Pentaho Metadata nor Pentaho Data Integration.  The "=" prefix should be omitted in that case.

Therefore the following formulas are equivalent:


References to external data-fields are expressed using square brackets.


References to external data-fields with parentheses in their titles are expressed using square brackets with quotation marks included inside them.

=["yourfield (containing parentheses)"]

Possible operators are:

  • Basic computation: +, -, *, /
  • Percentage: %
  • Power: ^
  • String-concatenation: &
  • Comparisons: =, <>, <, <=, >, >=
  • Formulas can have braces.


=(1+1)*2 / 3

results in 1.3333333


results in 0.1

String-constants are expressed using double-quotes.

="Your text here"

Strings can be concatenated using the & operator.

="Your text here " & "some more text"

The Range-Operators known from Spreadsheet applications are not supported, as they have no meaning in the reporting. A formula can only work on the current values of the current datarow. There is no way to access previous or next rows of data. (These are the same limitations as with all expressions, so this is nothing new.)

Date/Time Functions

DATE - Provides an internal number for the given date.
DATEDIF - Returns the number of years, months, or days between two dates.
TIME - Returns the current time.
DATEVALUE - Returns date serial number from given text.
NOW - Return the serial number of the current date and time. This returns the current day and time serial number, using the * current locale.
TODAY - Returns the current date, (yyyy-mm-dd).
DAY - Returns the day of the month for a date (1 through 31)
MONTH - Returns the month of a date
YEAR - Returns the year of a date
WEEKDAY - Returns the day of week from a date.
HOUR - Returns the hour (0 through 23) from a time.

Logical functions

AND - Returns TRUE if all arguments are TRUE.
FALSE - Defines the logical value as FALSE.
IF - Specifies a logical test to be performed.
NOT - Reverses the value of an argument.
OR - Returns TRUE if an arguments are TRUE.
TRUE - Defines the logical value as TRUE.
XOR - Returns a value of TRUE only if just one of its operands is TRUE. In contrast, an inclusive OR operator returns a value of TRUE if either or both of its operands are TRUE.

Rounding functions

INT - Returns a number down to the nearest integer

Mathematical functions

SUM - Sum a list of numeric values.
ABS - Returns the absolute (nonnegative) of the value.
AVERAGE - Average a list of numeric values.
ODD - Returns the rounding a number up to the nearest odd integer.
EVEN - Returns the rounding a number up to the nearest even integer.
MOD - Returns the remainder when one number is divided by another number.
MAX - Returns the maximum from a set of numbers.
MIN - Returns the minimum from a set of numbers.

Text functions

LEN - Returns the length in characters of the given value.
LOWER - Returns the given text in lower case.
UPPER - Returns the given text in upper case.
TRIM - Returns the given text free of leading and trailing spaces. Internal multiple spaces are replaced by one.
TEXT - Returns the given value as text.
T - Returns the given text value or a zero lenght string for non text type.
FIND - Returns the starting position of a given text.
EXACT - Reports if two text values are exactly equal using a case-sensitive comparison.
REPT - Returns text repeated Count times..
MID - Returns extracted text, given an original text, starting position, and length.
LEFT - Returns a selected number of text characters from the left.
RIGHT - Returns a selected number of text characters from the right.
URLENCODE - Applies URL-Encoding to a text given in the first parameter \
  using the encoding specified in the second parameter. If no encoding is \
  given, ISO-8859-1 is assumed.
REPLACE - Returns text where an old text is substituted with a new text within the start index and length range.
SUBSTITUTE - Returns text where an old text is substituted with a new text. If which is specified, only this occurrance will be replaced otherwise every occurrance will be replaced.

Information functions

HASCHANGED - Checks, whether the columns specified by the given names has changed.
ISBLANK - Checks, whether the value is undefined (null).
NA - Returns the constant error NA.
ISNA - Tells if the parameter is of error type NA.
ISLOGICAL - Checks whatever the value is of type Logical.
ISTEXT - Checks whatever the value is of type Text.
ISERR - Tells if the parameter is of error type but returns false if the error is of type NA.
ISERROR - Tells if the parameter is of error type but returns false.
CHOOSE - Uses an index to return a value from a list of values.
ISREF - Checks whatever the value is a reference.
ISNUMBER - Checks whatever the value is of type Number.
ISODD - Checks whatever the value is an odd number.
ISEVEN - Checks whatever the value is an even number.
ISNONTEXT - Checks whatever the value is not of type Text.

NULL - Returns a NULL-Value

Formula Operators.

+ Add
- Subtract
* Multiply
/ Divide
= Equal
<> Not Equal
< Less Than
> Greater Than
<= Equal Less Than
>= Equal Greater Than
& Concatenate
^ Power

Prefix/Suffix Operators

% Percent (Suffix)
+ Positive (Prefix)
- Negative (Prefix)

Note: The formula syntax used in LibFormula is based on the OpenFormula standard. The full document is downloadable from Oasis: OASIS OpenFormula Specifcation

  • No labels


  1. user-80001

    Can the internal number of the date be described?  For example, 'the date is represented by a floating point number where where the integer is the Julian date and the fractional part is the fraction of a day.  Times are UTC when time zone data is provided.'  This may be completely wrong, but it is the level of specificity that I would like to know.  What is the inverse of this Date function?

  2. user-cc1ad

    Hi I used this step within PDI so it may be only specific to its behavior within that part of the Pentaho Suite.

    Tony, helped out with the following that perhaps might help others who like I expect the behavior to act like Java or Javascript:

    The functions in the formula step don't act like functions in java/javascript/etc. - and you can't assume that functions return -1 codes on failure. So sometimes you have to use the "Informational" functions to evaluate the results.

    In your case you can use this modified version to get what you want:

    IF(ISERR(FIND("z";"abcabc")); "POOP";"NOT POOP")

    Here the ISERR function checks for an error from the FIND function evaluation and converts it into a boolean result for evaluation in the IF statement.

    I didn't have the "ISERR" check as I thought it would behave like a FIND function would in JS etc.



  3. user-8667f

    The internal date format is the same as in Excel.

    From the OpenOffice-Formula specification:

    Unknown macro: {bq}

    4.2.2 Date and DateTime

    A Date is a subtype of number; the number is the number of days from a particular date called the epoch. Thus, a date when presented as a general-purpose number is also called a serial number. This specification does not specify the exact value of the epoch, but implementationsshall support all dates from 1904-01-01 through 9999-12-31 (inclusive), with correct calculations. Portable spreadsheet files shall not assume any particular epoch values. Since dates are simply numbers, they can be added, subtracted, and so on like other Numbers. Subtracting one date from another produces the number of days between the dates.

    Implementations of formulas in an OpenDocument file shall use the epoch specified in the table-null-date attribute of the <table:calculation-setting> element, and shall support at least the following epoch values: 1899-12-30, 1900-01-01, and 1904-01-01.

    Many applications cannot handle Date values before January 1, 1900. Some applications can handle dates for the years 1900 and on, but include a known defect: they incorrectly presume that 1900 was a leap year (1900 was not a leap year). Applications may reproduce the 1900-as-leap- year bug for compatibility purposes, but should not. Portable documents shall not include date calculations that require the incorrect assumption that 1900 was a leap year. Portable documents shall not assume that negative date values are impossible (many implementations use negative dates to represent dates before the epoch). Portable documents should use the epoch date 1899-12-30 to compensate for serial numbers originating from applications that include a 1900- 02-29 leap day in their calculations.

    A DateTime is also a subtype of number, and for purposes of formulas it is simply the date plus the time of day.

    @Kent: Error handling is the same as in Excel. Formulas use paradigms of functional programming instead of imperative programming. Errors are indicated via exceptions, and can be caught via an "ISERROR" check.