| |||||||||
Hadoop
- Configuring Pentaho for your Hadoop Distro and Version — How to set up and configure Kettle for your specific Hadoop distribution.
- Loading Data into a Hadoop Cluster — How to load data into HDFS (Hadoop's Distributed File System), Hive and HBase.
- Loading Data into HDFS — How to use a PDI job to move a file into HDFS.
- Loading Data into Hive — How to use a PDI job to load a data file into a Hive table.
- Loading Data into HBase — How to use a PDI transformation that sources data from a flat file and writes to an HBase table.
- Transforming Data within a Hadoop Cluster — How to transform data within the Hadoop cluster using Pentaho MapReduce, Hive, and Pig.
- Using Pentaho MapReduce to Parse Weblog Data — How to use Pentaho MapReduce to convert raw weblog data into parsed, delimited records.
- Using Pentaho MapReduce to Generate an Aggregate Dataset — How to use Pentaho MapReduce to transform and summarize detailed data into an aggregate dataset.
- Transforming Data within Hive — How to read data from a Hive table, transform it, and write it to a Hive table within the workflow of a PDI job.
- Transforming Data with Pig — How to invoke a Pig script from a PDI job.
- Extracting Data from the Hadoop Cluster — How to extract data from Hadoop using HDFS, Hive, and HBase.
- Extracting Data from HDFS to Load an RDBMS — How to use a PDI transformation to extract data from HDFS and load it into a RDBMS table.
- Extracting Data from Hive to Load an RDBMS — How to use a PDI transformation to extract data from Hive and load it into a RDBMS table.
- Extracting Data from HBase to Load an RDBMS — How to use a PDI transformation to extract data from HBase and load it into a RDBMS table.
- Extracting Data from Snappy Compressed Files — How to configure client-side PDI so that files compressed using the Snappy codec can be decompressed using the Hadoop file input or Text file input step.
- Reporting on Data in Hadoop — How to report on data that is resident within the Hadoop cluster.
- Reporting on HDFS File Data — How to create a report that sources data from a HDFS file.
- Reporting on HBase Data — How to create a report that sources data from HBase.
- Reporting on Hive Data — How to create a report that sources data from Hive.
- Unit Test Pentaho MapReduce Transformation — How to unit test the mapper and reducer transformations that make up a Pentaho MapReduce job.
- Simple Chrome Extension to browse HDFS volumes — How to add a Chrome Omnibox extension to support HDFS browsing.
- Advanced Pentaho MapReduce — Advanced how-tos for developing Pentaho MapReduce.
- Using Compression with Pentaho MapReduce — How to use compression with Pentaho MapReduce.
- Using a Custom Partitioner in Pentaho MapReduce — How to use a custom partitioner in Pentaho MapReduce.
- Using a Custom Input or Output Format in Pentaho MapReduce — How to use a custom Input or Output Format in Pentaho MapReduce.
- Processing HBase data in Pentaho MapReduce using TableInputFormat — How to use HBase TableInputFormat in Pentaho MapReduce.
MapR
- Loading Data into a MapR Cluster — How to load data into CLDB (MapR’s distributed file system), Hive and HBase.
- Loading Data into CLDB — How to use a PDI job to move a file into CLDB.
- Loading Data into MapR Hive — How to use a PDI job to load a data file into a Hive table.
- Loading Data into MapR HBase — How to use a PDI transformation that sources data from a flat file and writes to an HBase table.
- Transforming Data within a MapR Cluster — How to leverage the massively parallel, fault tolerant MapR processing engine to transform resident cluster data.
- Using Pentaho MapReduce to Parse Weblog Data in MapR — How to use Pentaho MapReduce to convert raw weblog data into parsed, delimited records.
- Using Pentaho MapReduce to Generate an Aggregate Dataset in MapR — How to use Pentaho MapReduce to transform and summarize detailed data into an aggregate dataset.
- Transforming Data within Hive in MapR — How to read data from a Hive table, transform it, and write it to a Hive table within the workflow of a PDI job.
- Transforming Data with Pig in MapR — How to invoke a Pig script from a PDI job.
- Extracting Data from the MapR Cluster — How to extract data from the MapR cluster and load it into an RDBMS table.
- Extracting Data from CLDB to Load an RDBMS — How to use a PDI transformation to extract data from MapR CLDB and load it into a RDBMS table.
- Extracting Data from Hive to Load an RDBMS in MapR — How to use a PDI transformation to extract data from Hive and load it into a RDBMS table.
- Extracting Data from HBase to Load an RDBMS in MapR — How to use a PDI transformation to extract data from HBase and load it into a RDBMS table.
- Reporting on Data in the MapR Cluster — How to report on data that is resident within the MapR cluster.
- Reporting on CLDB File Data — How to create a report that sources data from a MapR CLDB file.
- Reporting on HBase Data in MapR — How to create a report that sources data from HBase.
- Reporting on Hive Data in MapR — How to create a report that sources data from Hive.
Cassandra
- Write Data To Cassandra — How to read data from a data source (flat file) and write it to a column family in Cassandra using a graphic tool.
- How To Read Data From Cassandra — How to read data from a column family in Cassandra using a graphic tool.
- How To Create a Report with Cassandra — How to create a report that uses data from a column family in Cassandra using graphic tools.
MongoDB
- Write Data To MongoDB — How to read data from a data source (flat file) and write it to a collection in MongoDB
- Read Data From MongoDB — How to read data from a collection in MongoDB.
- Create a Report with MongoDB — How to create a report that uses data from a collection in MongoDB.
- Create a Parameterized Report with MongoDB — How to create a parameterize report that uses data from a collection in MongoDB.
Instaview
- Google Analytics Instaview Sample template — Instaview template for use with Google Analytics
- Instaview Prompting
- MongoDB Instaview Sample template — Sample Instaview template for use with MongoDB



