The purpose of this document is to provide a detailed view of the overall software architecture that when combined makes up the entire Pentaho open source software suite.

At a high level, the software components can be divided into a variety of forms.  In the following detailed list, the general organization includes third party libraries and components that Pentaho has needed to fork and maintain, common libraries and projects that are used in general ways, pillars that are core business analytics or data integration elements, tools that allow access to pillars, and plugins across the pillars that provide additional functionality.  These same components can be looked at from a architectural purpose point of view, including four general areas including information delivery, data movement, analytics, and platform services.  For each project below we categorize in both manners to give a multi-faceted view of the overall architecture of Pentaho.

Cross Cutting Architectures and Use Cases

This section discusses high level cross cutting software architectures and use cases.

Version Control

At this time, Pentaho utilizes a combination of SVN and GIT for managing the source.  Here are some related articles:\

Metadata Definitions

As we continue to build a community of projects, it's important that they share terminology and common metadata.  Here's the beginnings of capturing shared metadata to be used across all Pentaho projects:\

Detailed Software Listing

This detailed software listing is organized in the general order in which software components are dependent on one another, although it should not be used as the official build order of Pentaho.

First Pass:

  High Level Description

  Source Path
  Architectural Owner
  Architectural Area

Third Party Maintained Forks

It is Pentaho's intention to avoid having to fork and maintain third party open source software, but on a few occasions it has been necessary.  The following list is of the current third party maintained forks that Pentaho includes in our product.

 Kettle-VFS (Fork of Apache VFS) (MattC)
 Hive JDBC (Will)
 Pentaho OFC4J (Will)


Kettle VFS is a maintained fork of Apache Commons VFS

Source Path: svn://

Architectural Owner: Matt Casters

Architectural Area: Data Management / Integration

Hive JDBC Drvier

Due to the dynamic nature of Hadoop, we currently maintain our own Hive JDBC Driver implementation

 Hive JDBC (Will)

 Pentaho OFC4J (Will)

Common Components