Hitachi Vantara Pentaho Community Wiki

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

About this page

Note: This Wiki space is in active development

This space is dedicated for Pentaho Data Integration (aka Kettle) topics around concepts, best practices and solutions.

It is more practical oriented whereas the basic reference documentation is more detailed and descriptive.

Main categories

  • Planning (e.g. Sizing questions, Multi-Tenancy)
  • Administration (e.g. Installation, Configuration, Multi-Tenancy)
  • Operations (Lifecycle Management, Monitoring, Logging, Exception Handling, Restart-ability)
  • Documentation (Auto-Documentation, Data-Lineage, Process Documentation, References, Dependencies)
  • Connecting with 3rd Party Applications (e.g. Webservices, ERP, CRM systems)
  • Special database issues and experiences
  • Big Data (e.g. Hadoop)
  • Clustering (Basic clustering, fail-over, load balancing, recover-ability)
  • Performance Considerations
  • Change Data Capture (CDC)
  • Real-Time-Concepts
  • Data Quality, Data Profiling, Deduplication (e.g. Master Data Management: MDM, Customer Data Integration: CDI)
  • Special File Processing (e.g. EDI(FACT), ASC X12, HL7 healthcare, large and complex XML files, hierarchical and multiple field formats)
  • Dynamic ETL (Meta-Data driven ETL, How to change the ETL process and fields dynamically depending on the processed content)
  • QA, Automated Testing
  • Special Job Topics (e.g. launching job entries in parallel, looping)
  • Special Transformation Topics (e.g. Error handling, tricky row and column handling)

Page Tree
root@self
startDepth2
sortposition
excerpttrue
reversefalse