Hitachi Vantara Pentaho Community Wiki
Child pages
  • Module Creation Policy
Skip to end of metadata
Go to start of metadata


The purpose of this wiki page and the team behind it is to get a handle on the proliferation of SVN folders and C.I. and Release projects.


  1. The ramp-up cost to platform development is too high in part due to the large number of Eclipse projects. It feels unwieldy and complex.
  2. Regarding platform plugins, we do not want to wind up with every new feature winding up in a plugin when it really belongs as part of the core platform
  3. The rich dependency structure and shear number of projects in the CI environment make it difficult to maintain
  4. Releasing the platform requires an orchestration of project/module release builds. We do not want to arbitrarily increase complexity here by adding new projects where they need not exist.

Team Members

Marc Batchelor - Chief Engineer, Pentaho
James Dixon - CTO, Pentaho
Doug Moran - VP Community, Pentaho
Will Gorman - Lead Engineer, Pentaho
Aaron Phillips - Engineer, Pentaho
Thomas Morgner - Chief Architect (Reporting), Pentaho


  • Module
    A module is a folder under source code control that is built to create one or more artifacts.
  • Project
    A project is a collection of modules that occur in source control under a "trunk" folder. They are distinguished from modules in that they are delivered and versioned together.


We need to ensure that our new policy does not leave us in a worse place. We also want to keep in mind the reasons why we have chosen in the past to either consolidate or modulize projects (anyone have a better word than modulize?, if so pls replace). There are valid reasons for both and we should make those reasons known.

Guidelines for the new policy:

  • supports the declarative nature of dependency management
  • reduces community development cost (can we be more specific?)
  • reduces Pentaho development cost (can we be more specific?)
  • must continue to deter dependency creep (a motivator for modulization)

The decision on how far to modularize has a technical impact to:

  • Release environment - The cost to add a project or module tracks linearly to the number of projects or modules added. In the release environment there is a lot of flexibility in how a release job is configured. For example, one job could build several modules.
  • CI environment - An increase in the number of modules typically has a non-linear cost in the CI environment. Unlike the release environment, we do not typically group modules in a single job, but assign a unique job to the smallest unit of source that produces a jar (i.e. .
  • development Eclipse project configuration


  • An approach to managing modules in Eclipse: Consider that an Eclipse project (little p) maps neither to a Project (big P) nor a Module. An Eclipse project then represents a view of one or more projects and/or modules. Given that assertion, we could present a much more clean view into developing the platform than we have in the past. For example, a developer may now see only 5 Eclipse projects instead of 15, where each of the 5 Eclipse projects manages 3 modules. We could make this work by having platform SVN "master" folders like we do with plugin-actions. Plugin-actions would be comprised of a number of self-contained modules which manage their own dependencies via unique ivy.xml files. The master project, plugin-actions, could then have a generated ivy.xml that includes the various module ivy.xml's to give Eclipse IvyDE developers a useable project.

Modularization: Pro's & Con's


  • Minimize dependency creep
  • Smaller chunks to work with
  • Enforces clean interfaces between modules
  • Dependencies can be tracked easier


  • More complex to set-up and
  • More expensive to maintain the build
  • Hudson does not handle modularized projects very well
  • Eclipse does not handle modularized projects well


  • In addition to the module-builds that we currently have ... I would propose we have a project level build that would resolve & compile each module under it (in order to identify circular dependencies) and produce one jar file for the project. Examples: bi-platform-v2 (open) would have 20-25 modules each capable of producing their own artifacts, but the vi-platform-v2 project would have a build that would product 1 jar file which contained the each modules compiled code.
  • It may already help to have a easy way to set-up the development environment for the developers. Have a master-script that checks out the project into a well-defined directory structure with all IDE-files set up so that you can start immediately.
  • Question: Are we just having a semantic discussion here about modularization? The fact is that we need to have independent source trees work together to comprise a "project". I don't know if we are asking the right question: How far should we go with modularization? That doesn't seem like the right question. I think the modularization is inevitable. Isn't the real question is how will this modularization manifest (i.e. in many Eclipse projects)? I realize I'm going a little backwards here, but I think I need to understand what the problem actually is, and the problem is not that we are too modular or too consolidated. I started the Drivers section above to help us get a handle on what the pain points are and what problems we need to solve. -AP
  • No labels