In the past we allowed values to be modified on the spot. Methods for doing so include:
We still allow the use of these methods in compatibility mode. However, the result stored in the modified value will be modified to the expected data type.
This data conversion is silent as it uses the v2 data conversion engine.
We force these conversions because in the past the conflicting data types have caused a lot of problems in transformations. Having alternating data types in a field in a row is not a good situation to have.
That is because it causes problems during serialization of rows (Sort Rows, Clustered execution, Serialize to file, ...) and during database manipulations (Table Output, Database Lookup, Database Join, Update, etc)
For example, let us take a look at the data conversion excersice:
The data type of field "dateField" is Date. The data is internally being converted to String. As such, the data type conflicts with the expected "Date" type
Because of that, before the value is sent to the next steps, the data type is changed back to Date from String. This uses the default mask ("yyyy/MM/dd HH:mm:ss.SSS") and as such, the conversion will fail.
The net result is that the dateField will contain value null.
That means that the Integer integerField magically switched to a Number data type in version 2. That in itself sometimes caused problems in various areas.
In version 3, the data type will be converted back to Integer. Even so, there can be plenty of situations where you would end up with a data conversion problem.
The solution is always to make sure that you create new values when doing data type conversions. Another piece of advice (as shown below) is to stop using the compatibility mode.
As such, the statements above becomes:
In turn this mean you might see a small performance drop because of all the extra work this step needs to do.
If you want to make as much use as possible of the new architecture, you can turn this mode off and change the code as such:
- intField.getInteger() --> intField
- numberField.getNumber() --> numberField
- dateField.getDate() --> dateField
- bigNumberField.getBigNumber() --> bigNumberField
In stead of using the various Java methods you can use the built-in library. You will also note that the resulting program code is more intuitive to use.
For example :
- checking for null is now: field.isNull() --> field==null
- Converting string to date: field.Clone().str2dat() --> str2date(field)
If you convert your code like this, you might enjoy significant performance benefits.
Please note that it is no longer possible to modify data in-place using the value methods. This was a design decision to make sure that no data with the wrong type ends up in the output rows of the step.
Java code & be.ibridge.kettle packages
There are 2 paths to a solution for this problem.
The best way to stay portable towards the future is obviously to stop using the internal PDI Java packages in your source code.
- Packages.be.ibridge.kettle.core.util.StringUtil.getVariable --> getVariable()
- Packages.be.ibridge.kettle.core.util.StringUtil.setVariable --> setVariable()
Move to org.pentaho.di packages
Obviously it is still possible (though not recommended) to use the internal PDI API. Most of the time the methods have stayed the same but simply moved to the org.pentaho.di naming of packages.
Base64 encoding and decoding of strings
Even though the Apache Commons Base64 encoding and decoding library was available before, the Base64 sample in version 2 used an be.ibridge class to do the work.
In version 3 we suggest you use the excellent Apache Commons library as shown in the updated sample:
Adding rows dynamically
For each of the values in the column "groupsField" we want to generate a new row. Here's how it was in the old version:
As you can see, we added a few methods:
The metadata of the input row, explaining the layout of the row data and the "row" object.
The metadata of the output row, determined by the input metadata and the extra fields specified in this step. (Fields section)
Create a copy of the input row data, but re-size the resulting row to the desired length.
Write a new row to the next steps in the transformation. The data type passed is Object. The data types passed MUST match those specified in the Fields section and as such the specification in getOutputRowMeta(). It is not legal to specify a String for a field and to pass a different data type.
The result of the script is in both versions the following:
A simple filter on the "ignore" field gives the desired result.
We believe that the new API will better stand the test of time because it is written without use of a direct Java code or the Java API.