Hitachi Vantara Pentaho Community Wiki
Child pages
  • Instant Pentaho Data Integration Kitchen
Skip to end of metadata
Go to start of metadata


Language: English
Author: Sergio Ramazzina
eBook: 68 pages
Release Date: July 2013
Publisher: Packt Publishing
ISBN: 184969690X





Approach

Filled with practical, step-by-step instructions and clear explanations for the most important and useful tasks. A practical guide with easy-to-follow recipes helping developers to quickly and effectively collect data from disparate sources such as databases, files, and applications, and turn the data into a unified format that is accessible and relevant to end users.

Details
Pentaho PDI is a modern, powerful, and easy-to-use ETL system that lets you develop ETL processes with simplicity. Explore and gain the experience and skills that you need to run processes from the command line or schedule them by using an extensive description and a good set of samples.

Instant Pentaho Data Integration Kitchen How-to will help you to understand the correct way to deal with PDI command line tools. We start with a recipe about how to configure your memory requirements to run your processes effectively and then move forward with a set of recipes that show you the different ways to start PDI processes.

We start with a recap about how transformations and jobs are designed using spoon and then move forward to configure memory requirements to properly run your processes from the command line.

We dive into the various flags that control the logging system by specifying the logging output and the log verbosity. We focus and deliver all the knowledge you require to run the ETL processes using command line tools with ease and in a proficient manner.

What you will learn from this book

  • Understand how to configure memory requirements
  • Discover the PDI repository structure from the command line
  • Explore how to start jobs from a filesystem packed in an archive file
  • Schedule PDI processes on Linux and Windows
  • Master the art of configuring log levels and logging output
  • Start jobs from the repository
  • Get feedback from your process execution