Hitachi Vantara Pentaho Community Wiki
Child pages
  • Kettle and Karaf Initialization
Skip to end of metadata
Go to start of metadata

Introduction

In 6.0 the initialization of Kettle was complicated by the introduction of OSGI (Karaf) as a mechanism for plugin deployment. This document outlines the interplay between the initialization of both. We’ll also look at some of the errors you’ll see if things go wrong inside OSGI with tips on how to resolve them

This document is based on a presentation available here: Presentation

Core Kettle Initialization

In order to use Kettle within your java application it must be initialized. This process loads all environment variables, plugins, and then most importantly for this topic, calls KettleLifecycleListeners letting them know that Kettle has been initialized.

These listeners are called one at a time. “Native” listeners are called before those supplied by plugins. Outside of these two categories there’s no guarantee of any ordering. As such it is not safe to assume that one LifecycleListener will be executed before another.

KettleLifecycleSupport

You’ll notice in the diagram above that KettleEnvironment relies on another class, KettleLifecycleSupport, to actually notify the listeners. This component keeps track of all KettleLifecycleListeners installed within the system by registering a plugin-type listener to Kettle’s PluginRegistry.

Role of KettleLifecycleListeners

These listeners can perform whatever tasks are required for Kettle to function. These listeners are often provided by plugins where they’re responsible for setting up systems the plugin needs to function.

Methods

  • onEnvironmentInit()

This method is called when Kettle initializes.

  • onEnvironmentShutdown()

Called when Kettle shuts down. There’s no guarantee that these methods will be called when the JVM is abnormally shut-down.

Weaknesses identified with this system

Kettle’s Lifecycle system worked fine under less complicated times. Thru the years we’ve added more and more capabilities as plugins. The synchronous initialization of each has lead to increased startup time. Also the fact that it only supports a single init event has made it more difficult to manage the initialization of the system as the complexity and number of plugins has grown.

Asynchronous Initialization

To work around this some of the listeners do the most expensive part of their initialization in separate threads. This strategy leads to a false initialized state. This is fine for optional capabilities such as AgileBI. However, if plugins are supplying components potentially needed to execute Transformations and Jobs, such as Steps, Entries, DatabaseMetas, etc., environments executing those right at startup (Pan, Kitchen) will see failures.

Limited Lifecycle Phases

The other problem with the existing system is that it only supports two lifecycle events (init, shutdown). This is not really sufficient for an application increasing becoming more and more modular with major systems provided by plugins. We would like to eventually have many more events such as: config, pre-init, init, post-init, ready, shutdown, destroy.

OSGI Complicating matters further

The introduction of OSGI as a major and the preferred vehicle for providing Kettle Plugin types and capabilities in general brought even more challenges to managing Kettle’s initialization. The main reason for this is that inside OSGI things can and are encouraged to initialize Asynchronously. Features -> Bundles -> Blueprint Containers, all startup in separate threads at indeterminate times.

This forced us to build an actor which could block Kettle initialization until certain that everything supplied by OSGI had fully initialized (or failed, more on that later).

PhasedLifecycleManager

In considering all of these issues and the new challenges with OSGI. We decided to design a new Lifecycle system which could address these needs and serve to unify the disparate lifecycle system in the future (kettle, platform, reporting, etc.), PhasedLifecycleManager.

This new manager is designed to support asynchronous listeners natively. Each listener is passed an Event object when the phase of the manager is changed. The manager will block further phase changes until all listeners have accepted in the event.

Unlike the old system, the new one supports adding as many events (phases) as are needed. Each phase event can have an associated EventObject within to provide more information to the listeners about the particular phase the system is moving to.

Integration of PhasedLifecycleManager with Kettle’s initialization

Migrating to the new PhasedLifecycleManager was deemed to be too expensive, disruptive and risky for 6.0. Instead we decided to put it in-place between Kettle’s LifecycleSystem and the OSGI environment. This is done by adapting the Kettle events to phases set on the new manager.

KettleLifecycleAdapter

The PDI-OSGI-Bridge provides a “native” KettleLifecycleListener which instantiates a KettlePhasedLifecycleManager and adapts the init() and shutdown() events to phases. Anyone from the OSGI side should be adding a phaseListener to this manager instead of providing a legacy KettleLifecycleListener. [Link to Example Code]

As you can see from the diagram, when onEnvironmentInit() is called from Kettle, the KettlePhasedLifecycleManager is advanced to the init phase (1). The call which made this phase change will block until all PhasedLifecycleListeners have accepted the phase change event.

KettleLifecycleListener

In reality, we only have one phased listener for 6.0, KettleLifeCycleListener. This listener, added by the PDI-OSGI-Bridge is responsible for watching and for all OSGI Features and Bundles to properly install before accepting the init event.

BundleContext

The internals of this listener are a little complicated. It cannot begin inspecting the OSGI environment until the BundleContext is provided to it. This may happen before or after Kettle’s initialization is called.

Timeout

When the “init” Phase Change event comes in it starts a timeout thread. This is here just in-case the BundleContext is never set. On timeout we accept the init event, letting Kettle continue on in hopes that It’ll work.

Starting the “Watcher” Thread

When both the BundleContext is set and the init event has been passed in, the Listener starts a new Thread. This thread first waits for all Karaf Features to become installed that are configured to do so at startup. If that succeeds it then waits for all Blueprint Containers in the installed bundles to be started.

If at any time either of these two checks produce an error, it will be logged and the init event will be accepted letting Kettle continue initialization.

  • No labels