Hitachi Vantara Pentaho Community Wiki
Child pages
  • Cassandra Source and Sink in Weka
Skip to end of metadata
Go to start of metadata

Cassandra is a sparse, column-oriented NoSQL database for handling large data volumes. A new package for Weka 3.7.5 or later adds connectors for Cassandra in the form of a "loader" and "saver". The functionality for these has been ported and adapted from the Cassandra input and output steps for the Kettle ETL tool.

The Cassandra connectivity for Weka can be found in the "cassandraConverters" package, and is installable from the built-in package manager. Basic GUI configuration of the CassandraLoader and CassandraSaver is available in Weka 3.7.5; the enhanced GUI configuration (shown in the screenshot below) requires some small changes to core Weka and is available through the use of a nightly snapshot of Weka 3.7 (in conjunction with the cassandraConverters package) until Weka 3.7.6 is released.

The following screenshot shows the configuration and query dialog for the CassandraLoader. The flow is being used in a streaming text mining example which loads Reuters text documents for learning the topic "Corn". The SGDText classifier operates directly on the String attributes containing the raw text and learns a linear support vector machine for classification.

  • No labels