Description
The Insert/Update step first looks up a row in a table using one or more lookup keys. If the row can't be found, it inserts the row. If it can be found and the fields to update are the same, nothing is done. If they are not all the same, the row in the table is updated.
Note: If you have multiple rows with the same keys that match, only the first row found is compared. This may lead to different results, depending on if the found row matches with given values or not. The update scenario looks like this: If a difference is found in the case of multiple rows with the same key, an UPDATE statement is fired against the database that updates all rows with the matching keys. This note also applies to the Update step.
Options
The table below provides a description of available options for the Insert/Update step:
Option |
Description |
---|---|
Step name |
Name of the step; this name has to be unique in a single transformation. |
Connection |
The database connection to which data is written |
Target schema |
The name of the Schema for the table to which data is written. This is important for data sources that allow for table names with periods in them. |
Target table |
Name of the table in which you want to do the insert or update. |
Commit size |
The number of rows to change (insert / update) before running a commit. |
Don't perform any updates |
If enabled, the values in the database are never updated;only inserts are performed. |
Key Lookup table |
Allows you to specify a list of field values and comparators. You can use the following comparators: =, = ~NULL, <>, <, <=, >, >=, LIKE, BETWEEN, IS NULL, IS NOT NULL
|
Update Fields |
Allows you to specify all fields in the table you want to insert/update including the keys. Avoid updates on certain fields by specifying N in the update column.
|
SQL button |
Click SQL to generate the SQL to create the table and indexes for correct operation. |
Note: Due to the extra lookup this step performs slower then a normal Table Output step. Another option is to use the Table Output step with error handling what is described in the chapter Step Error Handling. "If you put a primary key on the ID (in this case the customer ID) the insert into the table causes an error. Because of the error handling you can pass the rows in error to the update step. Preliminary tests have shown this strategy of performing upserts to be three times faster in some situations (with a low updates to inserts ratio)."
Metadata Injection Support (7.x and later)
All fields of this step support metadata injection. You can use this step with ETL Metadata Injection to pass metadata to your transformation at runtime.
7 Comments
user-1770a
In PostreSQL DB, when you put a number in commit size, if error occures, and you catch exception in a file, you will get generic error:
"ERROR: current transaction is aborted, commands ignored until end of transaction block"
If you leave commit size empty, you will get SQL error.
It seems it has something to do with implicit/explicit begin-commit in JDBC.
user-7d1f5
I believe some clarification is needed on the behaviour of the "update field" column. If a field is flagged as 'N' will it only be not updated/inserted OR will it also not be taken into consideration to decide whether the record has changed compare to a possible matching row found in the table.
user-bc2f5
When doing a large load/update (50,000+ rows) into a database, would it speed up or slow down the inserts/updates if the "Commit size" field was smaller? Example, I have a large, 50,000 record process running that does a lookup, insert, lookup, insert, and then update. The commit size is set to 100 so it is doing a commit after each 100 rows output to the DB. If I lowered the commit size to, say, 50, would it speed up the process? I'm thinking if it uses less memory to store the uncommitted rows, it might???
Jens Bleuel
Bryan, the commit is handled by the database. Just try it, if highering the number speeds up, it is depending on a lot of factors. Please use the forum for these types of questions.
user-706b3
> I believe some clarification is needed on the behaviour of the "update field" column. If a field is flagged as 'N' will
> it only be not updated/inserted OR will it also not be taken into consideration to decide whether the record has
> changed compare to a possible matching row found in the table.
> Comment: Posted by Jan Aertsen at Apr 25, 2010 07:14
This is an important consideration which has not been answered. If I mark an "update field" as "N", will that field not be updated when the row is updated due to change in another field which is marked "Y"?
user-053fd
Hi folks,
I looked in the source: https://github.com/pentaho/pentaho-kettle/blob/master/engine/src/org/pentaho/di/trans/steps/insertupdate/InsertUpdate.java
And here's what I see:
If a field is not in the update fields list, it will not be inserted or updated.
If a field is in the updates list, but is marked as 'N', then it is not checked to see if the record needs an update, nor is it updated if another field is marked 'Y' and has changed.
user-8ffe8
For anyone working with SQL Server, you may want to know that the comparison against GUID types mapped to Strings is done in a case-sensitive fashion. So, if you are updating a GUID value in the target table, make sure that the stream value is all caps. SQL Server will return GUIDs that way if you look them up, but if you add them into your stream as a constant you will need to make sure they are all caps.