Access Keys:
Skip to content (Access Key - 0)

About this page

Note: This Wiki space is in active development

This space is dedicated for Pentaho Data Integration (aka Kettle) topics around concepts, best practices and solutions.

It is more practical oriented whereas the basic reference documentation is more detailed and descriptive.

Main categories

  • Planning (e.g. Sizing questions, Multi-Tenancy)
  • Administration (e.g. Installation, Configuration, Multi-Tenancy)
  • Operations (Lifecycle Management, Monitoring, Logging, Exception Handling, Restart-ability)
  • Documentation (Auto-Documentation, Data-Lineage, Process Documentation, References, Dependencies)
  • Connecting with 3rd Party Applications (e.g. Webservices, ERP, CRM systems)
  • Special database issues and experiences
  • Big Data (e.g. Hadoop)
  • Clustering (Basic clustering, fail-over, load balancing, recover-ability)
  • Performance Considerations
  • Change Data Capture (CDC)
  • Real-Time-Concepts
  • Data Quality, Data Profiling, Deduplication (e.g. Master Data Management: MDM, Customer Data Integration: CDI)
  • Special File Processing (e.g. EDI(FACT), ASC X12, HL7 healthcare, large and complex XML files, hierarchical and multiple field formats)
  • Dynamic ETL (Meta-Data driven ETL, How to change the ETL process and fields dynamically depending on the processed content)
  • QA, Automated Testing
  • Special Job Topics (e.g. launching job entries in parallel, looping)
  • Special Transformation Topics (e.g. Error handling, tricky row and column handling)

This documentation is maintained by the Pentaho community, and members are encouraged to create new pages in the appropriate spaces, or edit existing pages that need to be corrected or updated.

Please do not leave comments on Wiki pages asking for help. They will be deleted. Use the forums instead.

Adaptavist Theme Builder (4.2.0) Powered by Atlassian Confluence 3.3.3, the Enterprise Wiki