General tips

AEL issues can be tricky to isolate due to the complexity of distributed execution.  Errors can originate from many sources:

Getting an understanding of where an issue originates ends up being crucial to deal with it.

Inspect Errors

Look at the errors returned to Spoon by the transformation execution.  Often they will give a strong clue of what's happening.  If not, check the daemon logs as well for errors during execution.

Simplify

If the default error and logging information is not helpful, the next step to take when dealing with an error is to simplify.  For example, if you're setting up AEL and no transformations seem to run, use the simplest possible configuration for the daemon first:

Such a configuration eliminates a number of possible errors.  If transformations succeed with these simpler options, begin adding them in one at a time to isolate the component causing issues.

If some but not all transformations execute on AEL, then simplify the transformation.  Remove steps until something runs.

Logging

Each component produces it's own logs.

F.A.Q.

I see a message about NoSuchField "TOKEN_KIND"

For example,

 Exception in thread "main" java.lang.NoSuchFieldError: TOKEN_KIND
  at org.apache.hadoop.crypto.key.kms.KMSClientProvider$KMSTokenRenewer.handleKind(KMSClientProvider.java:162)
  at org.apache.hadoop.security.token.Token.getRenewer(Token.java:351)
  at org.apache.hadoop.security.token.Token.renew(Token.java:377)
  at org.apache.spark.deploy.yarn.security.HadoopFSCredentialProvider$$anonfun$getTokenRenewalInterval$1$$anonfun$5$$anonfun$apply$1.apply$mcJ$sp(HadoopFSCredentialProvider

This can happen if the versions of the hadoop libraries picked up by the spark process are inconsistent with one another.

When the daemon is running on a Hadoop cluster, the SPARK_DIST_CLASSPATH environment variable will be set to point to the hadoop libraries installed on the cluster. If the Spark distribution installed also has it's own hadoop libraries, there could be a conflict.

If using the Apache Spark, verify that the version used is the one "with user-provided Apache Hadoop" (https://d3kbcqa49mib13.cloudfront.net/spark-2.2.0-bin-without-hadoop.tgz)

I see a message with "Verify app proPerties" in the daemon:

 2017-10-17 14:28:50.810  INFO 15597 --- [launcher-proc-1] o.a.s.launcher.app.SparkWebSocketMain    : Verify app properties.
2017-10-17 14:28:50.810  INFO 15597 --- [launcher-proc-1] o.a.s.launcher.app.SparkWebSocketMain    :
-- Args passed to Spark:  ArgHandler{driverSecurityPrincipal=HTTP/devcdh57n1.pentahoqa.com:53000,
driverSecurityKeytab=/home/devuser/http-devcdh57n1.pentahoqa.com-53000.keytab,
requestId='2706639b-2451-4782-b1f4-6540ce5629a7', daemonURL='ws://localhost:53000/execution',
proxyingUser=devuser, proxyKeytab=/home/devuser/devuser.keytab}

The Spark Main helpfully shows the arguments used when it was invoked, including the use of a driverSecurityPrincipal.  Since that principal is defined, AEL is likely configured with a kerberos secured daemon.  The daemon URL specifies localhost, however.  Connecting to a kerberos secured service requires using the FQDN for service authentication.

When using SSL, I see a "Handshake response not received." in Spoon.

Verify that the certificate is trusted by the client.  See http://wiki.pentaho.com/display/EPAM/Testing+SSL+with+AEL, the sections on creating a trusted cert and importing it into the java keystore. 

ClassNotFound involving jersey classes

See http://wiki.pentaho.com/display/COM/AEL+and+Spark+Library+Conflicts

I get "javax.websocket.DeploymentException: Connection failed". when I try to run a transformation on AEL Spark from PDI.

Make sure that AEL Daemon is running, and you have specified the correct hostname and port in Run configurations.

You can check this by opening the URL for AEL Daemon in the browser: e.g. https://localhost:53000. This should open an error page saying something like "This application has no explicit mapping for /error, so you are seeing this as a fallback.".

I get "Yarn application has already ended!  It might have been killed or unable to launch application master."

One of the root causes of this error is the inability for the AEL Daemon to launch the Spark Application.  Double check configuration files.  If running in a secured environment please make sure that the user submitting the Spark application is a user on the cluster.  For example if you run as `suzy`then `suzy` must be a user on each node in the cluster.  LDAP is a mechanism that could assist in keeping accounts in sync across systems.

I get "Broken pipe exception".

Make sure that output was created correctly. Further investigation needed, the error points to problem in communication between PDI and Daemon.

I see no values in Input and Output columns in Step Metrics in PDI.

This is still not implemented. AEL Spark only populates Read and Written columns. The difference between Read/Written vs Input/Output is:

I get error when trying to compile AEL Daemon: DaemonMainTest.testGreetingEndpoint » IllegalState Failed to load ApplicationC...

Port 53000 may already be in use. You could define a port in the tests so it will not use 53000, that way if you are currently running the daemon you can still run the tests.

Try adding `ael.unencrypted.port=52000` into the `application-test.properties`. The test server would then start up on 52000, But the application would use 53000.