Hitachi Vantara Pentaho Community Wiki
Access Keys:
Skip to content (Access Key - 0)

Cassandra is a sparse, column-oriented NoSQL database for handling large data volumes. A new package for Weka 3.7.5 or later adds connectors for Cassandra in the form of a "loader" and "saver". The functionality for these has been ported and adapted from the Cassandra input and output steps for the Kettle ETL tool.

The Cassandra connectivity for Weka can be found in the "cassandraConverters" package, and is installable from the built-in package manager. Basic GUI configuration of the CassandraLoader and CassandraSaver is available in Weka 3.7.5; the enhanced GUI configuration (shown in the screenshot below) requires some small changes to core Weka and is available through the use of a nightly snapshot of Weka 3.7 (in conjunction with the cassandraConverters package) until Weka 3.7.6 is released.

The following screenshot shows the configuration and query dialog for the CassandraLoader. The flow is being used in a streaming text mining example which loads Reuters text documents for learning the topic "Corn". The SGDText classifier operates directly on the String attributes containing the raw text and learns a linear support vector machine for classification.

This documentation is maintained by the Pentaho community, and members are encouraged to create new pages in the appropriate spaces, or edit existing pages that need to be corrected or updated.

Please do not leave comments on Wiki pages asking for help. They will be deleted. Use the forums instead.

Adaptavist Theme Builder Powered by Atlassian Confluence