Hitachi Vantara Pentaho Community Wiki
Child pages
  • Pentaho Data Integration Beginner's Guide - Second Edition
Skip to end of metadata
Go to start of metadata

Language: English
Author: Maria Carina Roldan
Paperback: 502 pages [ 235mm x 191mm ]
Release Date: October 2013
Publisher: Packt Publishing
ISBN: 1782165045


  • Manipulate your data by exploring, transforming, validating, and integrating it
  • Learn to migrate data between applications
  • Explore several features of Pentaho Data Integration 5.0
  • Connect to any database engine, explore the databases, and perform all kind of operations on databases


This book focuses on teaching you by example. The book walks you through every aspect of Pentaho Data Integration, giving systematic instructions in a friendly style, allowing you to learn in front of your computer, playing with the tool. The extensive use of drawings and screenshots make the process of learning Pentaho Data Integration easy. Throughout the book, numerous tips and helpful hints are provided that you will not find anywhere else.

What you will learn from this book

  • Install and get started with Pentaho Data Integration
  • Get started with MySQL
  • Learn the ins and outs of Spoon, the graphical designer tool
  • Transform data in several ways such as performing simple and complex calculations, cleaning, counting, de-duplicating, filtering, and ordering
  • Learn to get data from all kind of data sources as plain files, Excel spreadsheets, databases, XML files and more, then preview it, and send it back to the same or different destinations
  • Discover how to read and parse unstructured files
  • Embed Java and JavaScript code in your Pentaho Data Integration transformations to enrich the treatment of data
  • Use Pentaho Data Integration to perform CRUD (create, read, update, and delete) operations on databases
  • Learn the basic concepts of data warehousing
  • Populate a data warehouse with Pentaho Data Integration including loading slowly changing dimensions, junk dimensions, time dimensions and more
  • Implement business processes by scheduling tasks, checking conditions, organizing files and folders, running daily processes, treating errors, and so on in a way that meets your requirements

Unknown macro: {HTMLComment}

Table of Contents (full version)

  • Chapter 1 - Getting started with Pentaho Data Integration (Read an excerpt)
  • Chapter 2 - Getting Started with Transformations
  • Chapter 3 - Basic Data Manipulation
  • Chapter 4 - Controlling the Flow of Data
  • Chapter 5 - Transforming Your Data with JavaScript Code and the JavaScript Step
  • Chapter 6 - Transforming the Rowset
  • Chapter 7 - Validating Data and Handling Errors
  • Chapter 8 - Working with Databases
  • Chapter 9 - Performing Advanced Operations with Databases
  • Chapter 10 - Creating Basic Task Flow
  • Chapter 11 - Creating Advanced Transformations and Jobs
  • Chapter 12 - Developing and implementing a simple datamart ([sample chapter ('s%20guide_SampleChapter.pdf)])
  • Chapter 13 - Taking it Further
  • Appendix A - Working with repositories
  • Appendix B - Pan and Kitchen: Launching Transformations and Jobs from the Command Line
  • Appendix C - Quick Reference: Steps and Job Entries
  • Appendix D - Spoon Shortcuts
  • Appendix E - Introducing PDI 4 features
  • Appendix F - Pop Quiz Answers