Hitachi Vantara Pentaho Community Wiki
Child pages
  • Insert - Update
Skip to end of metadata
Go to start of metadata

Description

The Insert/Update step first looks up a row in a table using one or more lookup keys. If the row can't be found, it inserts the row. If it can be found and the fields to update are the same, nothing is done. If they are not all the same, the row in the table is updated.

Note: If you have multiple rows with the same keys that match, only the first row found is compared. This may lead to different results, depending on if the found row matches with given values or not. The update scenario looks like this: If a difference is found in the case of multiple rows with the same key, an UPDATE statement is fired against the database that updates all rows with the matching keys. This note also applies to the Update step.

Options

The table below provides a description of available options for the Insert/Update step:

Option

Description

Step name

Name of the step; this name has to be unique in a single transformation.

Connection

The database connection to which data is written

Target schema

The name of the Schema for the table to which data is written. This is important for data sources that allow for table names with periods in them.

Target table

Name of the table in which you want to do the insert or update.

Commit size

The number of rows to change (insert / update) before running a commit.

Don't perform any updates

If enabled, the values in the database are never updated;only inserts are performed.

Key Lookup table

Allows you to specify a list of field values and comparators. You can use the following comparators: =, = ~NULL, <>, <, <=, >, >=, LIKE, BETWEEN, IS NULL, IS NOT NULL

Note: Click Get fields to retrieve a list of fields from the input stream(s).

Update Fields

Allows you to specify all fields in the table you want to insert/update including the keys. Avoid updates on certain fields by specifying N in the update column.

Note: Click Get Update fields to retrieve a list of update fields from the input stream(s).

SQL button

Click SQL to generate the SQL to create the table and indexes for correct operation.


Note: Due to the extra lookup this step performs slower then a normal Table Output step. Another option is to use the Table Output step with error handling what is described in the chapter Step Error Handling. "If you put a primary key on the ID (in this case the customer ID) the insert into the table causes an error. Because of the error handling you can pass the rows in error to the update step. Preliminary tests have shown this strategy of performing upserts to be three times faster in some situations (with a low updates to inserts ratio)."

Metadata Injection Support (7.x and later)

All fields of this step support metadata injection. You can use this step with ETL Metadata Injection to pass metadata to your transformation at runtime.

7 Comments

  1. user-1770a

    In PostreSQL DB, when you put a number in commit size, if error occures, and you catch exception in a file, you will get generic error:

     "ERROR: current transaction is aborted, commands ignored until end of transaction block"

     If you leave commit size empty, you will get SQL error.

     It seems it has something to do with implicit/explicit begin-commit in JDBC.

  2. user-7d1f5

    I believe some clarification is needed on the behaviour of the "update field" column.  If a field is flagged as 'N' will it only be not updated/inserted OR will it also not be taken into consideration to decide whether the record has changed compare to a possible matching row found in the table.

  3. user-bc2f5

    When doing a large load/update (50,000+ rows) into a database, would it speed up or slow down the inserts/updates if the "Commit size" field was smaller?  Example, I have a large, 50,000 record process running that does a lookup, insert, lookup, insert, and then update.  The commit size is set to 100 so it is doing a commit after each 100 rows output to the DB.  If I lowered the commit size to, say, 50, would it speed up the process?  I'm thinking if it uses less memory to store the uncommitted rows, it might???

  4. Bryan, the commit is handled by the database. Just try it, if highering the number speeds up, it is depending on a lot of factors. Please use the forum for these types of questions.

  5. user-706b3

    > I believe some clarification is needed on the behaviour of the "update field" column.  If a field is flagged as 'N' will

    > it only be not updated/inserted OR will it also not be taken into consideration to decide whether the record has

    > changed compare to a possible matching row found in the table.

    > Comment: Posted by Jan Aertsen at Apr 25, 2010 07:14

    This is an important consideration which has not been answered.  If I mark an "update field" as "N", will that field not be updated when the row is updated due to change in another field which is marked "Y"?

  6. user-053fd

    Hi folks,

    I looked in the source: https://github.com/pentaho/pentaho-kettle/blob/master/engine/src/org/pentaho/di/trans/steps/insertupdate/InsertUpdate.java

    And here's what I see:

    If a field is not in the update fields list, it will not be inserted or updated.

    If a field is in the updates list, but is marked as 'N', then it is not checked to see if the record needs an update, nor is it updated if another field is marked 'Y' and has changed.

  7. user-8ffe8

    For anyone working with SQL Server, you may want to know that the comparison against GUID types mapped to Strings is done in a case-sensitive fashion.  So, if you are updating a GUID value in the target table, make sure that the stream value is all caps.  SQL Server will return GUIDs that way if you look them up, but if you add them into your stream as a constant you will need to make sure they are all caps.