Hitachi Vantara Pentaho Community Wiki
Child pages
  • Extraction Patterns

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Push - kettle batch located at source system and pushes data to ETL staging area
  • Pull - kettle batch located in ETL server pulling data into the ETL staging area
Pull

...

Pattern 1: Full extract with output truncate

...

The kettle script consist of an input step and an output step.

Output step is set to truncate table.

...

Pattern 2: Full extract with sql script

...

The kettle script consist of an input step and an output step plus a sql script that is not connected.

The stand-alone sql script is a special step that will be executed before any trans steps is run.

When the table is small this is okay.

If you are doing a major extract, do not place the sql script in the trans.

The sql script will cause locking when it takes some time to execute or is locked from accessing or writing to the table.

...

Pattern 3: Full extract job with 2 transformation steps

...

The kettle script comprise of a job and a extract transformation.

Inside the job is a sql script step.

The difference from Pattern 2 is that the sql script is now in the job not in the trans.

I have encountered problems with sql script in trans.

...

Pattern 4: Extraction transformation with an error step after the output step

...

When you have error handling turned on and an output text file on the error step, the output step will exit elegantly dumping error lines with error codes to the error output text file.

...

Pattern 5: Incremental extraction