Hitachi Vantara Pentaho Community Wiki
Skip to end of metadata
Go to start of metadata

The next development release of Weka (3.7.2), code named "Weka Lite," will move away from a single monolithic executable jar file to a modular package-based system. Although Weka's single jar file is only ~6Mb in size, it has become bloated in terms of the number of algorithms and options available. The plan is to have a stripped down "core" jar file that contains all the infrastructure plus a handful of the most well known algorithms from each of the main learning categories. All other algorithms will be available to the user as downloads via a package management system.

The main benefits of this approach are twofold. From the users perspective, Weka is less overwhelming (in terms of what is available initially) and easier to get started with. From the Weka maintainer's perspective, maintenance becomes less of a burden as it is made explicit which packages are external contributions and which come from the Weka team. Community members seeking help with an algorithm can either ask on the Weka forums (Pentaho or the Weka mailing list), or contact the author of the package in question.

Packages in the new Weka Lite will be hosted by either the Weka team (for internal code) or the author (for contributed code). The Weka team will maintain a repository of meta data on all the available packages (not unlike the CRAN system used for the R statsitical software). Contributers will need to provide an up-to-date meta data file for any packages that they wish to contribute to Weka. Both command line and graphical package management clients will be available. The package management system will subsume the existing plugin mechanisms in Weka (visualization plugins in the Explorer and the Knowledge Flow's plugin system). To alleviate library duplication, packages in the new Weka Lite will be able to depend on other packages (as well as a given version of the core system). The package management software will take care of resolving dependencies and detecting conflicts. This approach makes it possible for contributers to Weka to easily make use of external libraries. In the past we have avoided the use of external libraries due to the added complication they introduce to maintenance, installation and use of Weka. Under the new system, it will be the responsibility of the contributer to make sure that their package(s) stay compatible with changes to external libraries (if used).



  • No labels