Hitachi Vantara Pentaho Community Wiki
Child pages
  • Database lookup
Skip to end of metadata
Go to start of metadata

Description

The Database lookup step allows you to look up values in a database table. Lookup values are added as new fields onto the stream.

Options

The following table describes the available options for configuring the database lookup:

Option

Description

Step name

Name of the step; this name has to be unique in a single transformation.

Connection

Database connection to the lookup table.

Lookup schema

Database schema containing the lookup table.

Lookup Table

Name of the database table used for the lookup.

Enable cache?

Enables caching of database lookups. This means that once a key (or group of key) has been looked up, the looked up values are stored, and returned again the next time this key (or group of key) is being looked up (without incurring the cost of a database call).

Important: If other processes are changing values in the table where you perform a lookup, do not cache values. In all other instances, caching values increases the performance substantially because database lookups are relatively slow. If you can't use the cache, consider launching several copies of the simultaneously. A simultaneous launch keeps the database busy through  different connections. See Launching several copies of a step.

Cache size in rows

The size of the cache (number of rows), 0 means cache everything.

Load all data from table

Pre-loads the cache with all the data present in the lookup table.  This may improve performance by avoiding database calls.
However, if you have a large table, you risk running out of memory.


IMPORTANT:  In memory look-ups can lead to different results because of the differences in the way your database compares data. For example, if your database table allows case-insensitive comparisons, you may get different results with this option. Further information can be found on PDI-11440 (illustrating an issue with the Merge join step, but the issue is valid for this step as well). The same goes for data with trailing spaces. Those are ignored in certain databases when using certain character data types. See more details on the below note about this option.

Keys to look up table

The keys and conditions to perform the database lookup.

Tip: When using the LIKE operator on the key lookup it's not automatically adding wildcards to the stream value.  So by default the LIKE operator is behaving as an "=" (see your database for SQL specific topics on the LIKE operator). It is possible to add the wildcards to add a 'Replace in String' step before the database lookup.  In the replace step select the lookup field then use Regex to look for (^.*) and replace with %$1%.  This will add the wildcards to the field for the subsequent lookup task.

Values to return table

The fields from the lookup table to add to the output stream.
"New name" allows you to use a different name if the database column name is inappropriate.
"Default" is the value returned (instead of null) if the lookup fails. Note that enabling error handling on the Database Lookup step will redirect all failed lookup rows to the error handling step, instead of adding them to the main output steam with null/default values.
"Type" is the type of the output field.

Do not pass the row if the lookup fails

Enable to avoid passing a row when lookup fails. Within the SQL syntax, enabling this would be an INNER JOIN, otherwise it would be an OUTER JOIN.

Fail on multiple results?

Enable to force the step to fail if the lookup returns multiple results.

Order by

If the lookup query returns multiple results, the ORDER BY clause helps you to select the record to take. For example, ORDER BY would allow you to pick the customer with the highest sales volume in a specified state.

Get Fields

Click to return a list of available fields from the input stream(s) of the step.

Get lookup fields

Click to return a list of available fields from the lookup table that can be added to the step's output stream.

Note on the Load all data from table option: As an example: when storing a value of type "CHAR(3)", many databases will store "ab" using 3 characters: "ab " (notice the blank space). When you do a "SELECT * FROM my_lookup_table WHERE key_column = 'ab'", the database is smart and adds a blank space before applying the where clause (it looks for 'ab ' instead of 'ab').

The problem occurs when you use "load all data from table" to pre-load the cache: at startup, the cache is built using all the values present in the database, so we store 'ab ' in the cache. Later, looking up 'ab' fails, because the trailing space matters in java equality.

Note that this problem does not occur when using the cache, if "load all data from table" is disabled. Indeed, the first lookup for 'ab' would find no entry in the cache, call the database (which is smart enough to handle the trailing whitespace problem), get the correct result, and store it in the java cache under the 'ab' key. Thus, the next lookup for 'ab' will find the appropriate result in the cache :)

  • No labels