Hitachi Vantara Pentaho Community Wiki
Child pages
  • Extraction Patterns
Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Extraction of data from source systems

The complexity of extraction batch depends very much on the environment.

  • Is source system mission critical?
  • Can the source system sustain a long query?
  • Is source system located in local lan or cloud?
  • Is source system continuously being accessed?

There are 2 main scenarios...

  • Push - kettle batch located at source system and pushes data to ETL staging area
  • Pull - kettle batch located in ETL server pulling data into the ETL staging area

Pattern 1: Full extract with output truncate

The kettle script consist of an input step and an output step.

Output step is set to truncate table.

Pattern 2: Full extract with sql script

The kettle script consist of an input step and an output step plus a sql script that is not connected.

Pattern 3: Full extract

  • No labels