Sample Instaview template for use with Amazon Redshift that can be easily added to your installation. It demonstrates how to access data stored in Amazon's Redshift Data Warehouse in the Cloud, immediately visualize the data and save offline for access when not connected to Redshift. Included with this template is a data set that can be loaded into Redshift for demo purposes.
In order to follow along with this how-to guide you will need a desktop installation of Pentaho Data Integration with Instaview.
You will need an amazon Redshift account to be able to run this template. Connection details can be found by logging into your AWS account here: http://aws.amazon.com
How to load Foodmart data on Redshift:
- Download Foodmart data files here:
- *Download data: Click Foodmart-Redshift-Load.zip to download the data and save it locally.
- Log in to Redshift on Amazon Web Services.
Assuming you have a Redshift cluster already created, (see Amazon Redshift documentation for instructions on creating a cluster) and have created a Bucket on Amazon S3, (see Amazon Web Services documentation for instructions on creating an S3 bucket), navigate to your bucket on S3.
- Click on the Actions menu and choose upload. Select all files ending in .txt in the foodmart-Redshift directory and upload them to your bucket on S3. These are the data files for the tables in the Foodmart DW.
- In PDI, open the "Redshift SQL createscript" transformation.
- In the Execute SQL script step, configure your connection information with connection info for your Redshift cluster. For the Hostname field, enter the value that is found on the info page for the Redshift cluster in the Endpoint field under Cluster Database Properties. Save and run the transformation. This will create the tables on the Redshift cluster.
- In PDI, open the "Redshift SQL copyscript".
- In the Execute SQL script configure connection info to the Redshift cluster. In the SQL script, replace all values of <S3_Bucket_Name>, <aws_access_key_id>, and <aws_secret_access_key_id> with the correct info for your AWS account.
- Save and run transformation. This will copy the data from the S3 bucket to the Redshift cluster.
Now that there is data in the right place, all we need to do is drop the Template into the correct folder
- Get the Template: Click Redshift.ktr to download the Template and save it to your instaview template Samples folder. Navigate to the install directory where Pentaho Data Integration is installed. From there, navigate to:
- Get the Icon: Click Redshift.png to download the icon and save in the same folder as the previous step
- Switch to Instaview: Start up Pentaho Data Integration (you do not need to restart) and select the "Instaview" perspective.
- Try the sample: From the Instaview Welcome Screen:
- Another dialog box will appear, allowing you to establish your connection to Redshift. Use the connection details you received from Amazon. Enter those connection details for Redshift and confirm the connection by clicking "Test".
Click to enlarge image.
- Click: "OK"
- The SQL query found in the dialog box was established for use with the Foodmart Dataset found in a previous step.
- Click "OK" to run
During this guide you learned how to install and use the Redshift template. Redshift is a cloud-based data warehouse, so establishing connection is dependent on your setting up a Redshift Cluster. Once the connection is made, the template can be used with the Foodmart data, or you can use your own data from Redshift by changing the SQL Query in the input dialog.
Other guides in this series cover the sorting and grouping of MongoDB data, create reports, and combine data from MongoDB with data from other sources, accessing and enhancing Google Analytics data, and using the Twitter Template for analyzing twitter feeds.