Hitachi Vantara Pentaho Community Wiki
Child pages
  • AEL and Spark Library Conflicts

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
minLevel3

The Problem

There are a number of different places jar files originate during execution of a transformation in the AEL engine:

  1. The set of jars in data-integration/lib
  2. Jars from spark-install/jars
  3. Jars from the hadoop classpath
  4. AEL jars in Karaf
  5. Jars from kettle plugins (OSGi and otherwise)

Library versions contained in these different locations can and will conflict in some cases.  This can cause general problems where spark libraries conflict with hadoop libraries [1].  It also has the potential to create AEL specific problems [2].

Several bugs have arisen because of library conflicts, both within AEL code and from Spark generally:

OSGi  is valuable specifically because it deals with these sorts of problems, and luckily most of AEL execution happens within Karaf.  The places where we're vulnerable, however, are:

  1. Execution which occurs outside of the engine, not leveraging karaf (like SparkWebSocketMain).
  2. The set of packages specified by org.osgi.framework.system.packages.extra (from karaf/etc/custom.properties).  That is, the set of packages exposed from the framework classloader.

As of Pentaho 8.0, running AEL with Spark 2.1.0, the set of jars in conflict between spark-install/jars and data-integration/lib are the following 24 libraries:

PDI 8.0

SPARK 2.1.0

activation-1.1.jar

activation-1.1.1.jar

antlr-complete-3.5.2.jar

antlr-2.7.7.jar

commons-beanutils-1.9.3.jar

commons-beanutils-1.7.0.jar

commons-configuration-1.9.jar

commons-configuration-1.6.jar

commons-io-2.2.jar

commons-io-2.4.jar

commons-lang3-3.0.jar

commons-lang3-3.5.jar

commons-net-1.4.1.jar

commons-net-2.2.jar

commons-pool-1.5.7.jar

commons-pool-1.5.4.jar

derby-10.2.1.6.jar

derby-10.12.1.1.jar

eigenbase-properties-1.1.2.jar

eigenbase-properties-1.1.5.jar

httpclient-4.5.3.jar

httpclient-4.5.2.jar

httpcore-4.4.6.jar

httpcore-4.4.4.jar

jackson-annotations-2.3.3.jar

jackson-annotations-2.6.5.jar

jackson-core-2.3.3.jar

jackson-core-2.6.5.jar

jackson-core-asl-1.9.2.jar

jackson-core-asl-1.9.13.jar

jackson-databind-2.3.3.jar

jackson-databind-2.6.5.jar

jackson-jaxrs-1.9.2.jar

jackson-jaxrs-1.9.13.jar

jackson-mapper-asl-1.9.2.jar

jackson-mapper-asl-1.9.13.jar

jackson-xc-1.9.3.jar

jackson-xc-1.9.13.jar

janino-2.5.16.jar

janino-3.0.0.jar

jersey-client-1.19.1.jar

jersey-client-2.22.2.jar

jersey-server-1.19.1.jar

jersey-server-2.22.2.jar

jetty-util-8.1.15.v20140411.jar

jetty-util-6.1.26.jar

joda-time-1.6.jar

joda-time-2.9.3.jar

slf4j-api-1.7.7.jar

slf4j-api-1.7.16.jar

slf4j-log4j12-1.7.7.jar

slf4j-log4j12-1.7.16.jar

snappy-java-1.1.0.jar

snappy-java-1.1.2.6.jar

validation-api-1.0.0.GA.jar

validation-api-1.1.0.Final.jar

Of these, the set of packages exposed from the framework classloader boil down to these packages:

Code Block
com.sun.jersey.api.client
org.apache.commons.configuration
org.apache.commons.pool
org.apache.commons.pool.impl
org.apache.http
org.apache.http.client.utils
org.slf4j

Since these packages are provided via the framework classloader, and are loaded from indeterminate lib versions, there's inherent risk that undesired and unpredictable behavior could result.

Risk Mitigation

  1. Test specific Spark and Hadoop versions and recommend sticking to that set.
  2. Minimize usage of classes within the above packages.  Update the list of potentially conflicting packages as new releases come out.
  3. Wherever possible, leverage classes injected via blueprint within AEL.
  4. Avoid usage of libraries that overlap with hadoop / spark libraries for any packages retrieved via the framework classloader.

References

[1] https://markobigdata.com/2016/08/01/apache-spark-2-0-0-installation-and-configuration

https://www.hackingnote.com/en/spark/trouble-shooting/NoClassDefFoundError-ClientConfig/

[2] http://jira.pentaho.com/browse/BACKLOG-17911

http://jira.pentaho.com/browse/BACKLOG-19292

DEPRECATED - See AEL and Spark Library Conflicts (Community Space)