Hitachi Vantara Pentaho Community Wiki
Child pages
  • Using the Knowledge Flow Plugin

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

1

...

Introduction

...

The

...

Knowledge

...

Flow

...

plugin

...

is

...

an

...

enterprise

...

edition

...

tool

...

that

...

allows

...

entire

...

data

...

mining

...

processes

...

to

...

be

...

run

...

as

...

part

...

of

...

a

...

Kettle

...

(PDI)

...

ETL

...

transformation.

...

There

...

are

...

a

...

number

...

of

...

use

...

cases

...

for

...

combining

...

ETL

...

and

...

data

...

mining,

...

such

...

as:

...

  • Automated

...

  • batch

...

  • training/refreshing

...

  • of

...

  • predictive

...

  • models

...

  • Including

...

  • data

...

  • mining

...

  • results

...

  • in

...

  • reports

...

  • Access

...

  • to

...

  • data

...

  • mining

...

  • data

...

  • pre-processing

...

  • techniques

...

  • in

...

  • ETL

...

  • transformations

...

Training/refreshing

...

of

...

predictive

...

models

...

is

...

the

...

application

...

described

...

in

...

this

...

document

...

and,

...

when

...

combined

...

with

...

the

...

Weka

...

Scoring

...

plugin

...

for

...

deploying

...

predictive

...

models,

...

can

...

provide

...

a

...

fully

...

automated

...

predictive

...

analytics

...

solution.

...

2

...

Requirements

...

The

...

Knowledge

...

Flow

...

plugin

...

requires

...

Kettle

...

3.1

...

or

...

higher

...

and

...

Weka

...

3.6

...

or

...

higher.

...

Due

...

to

...

SWT-AWT

...

problems

...

under

...

Mac

...

OS

...

X,

...

OS

...

X

...

users

...

will

...

require

...

the

...

Eclipse

...

Cocoa

...

64

...

bit

...

SWT

...

libraries

...

(version

...

3.5)

...

in

...

order

...

to

...

use

...

the

...

plugin.

...

These

...

libraries

...

can

...

easily

...

be

...

dropped

...

in

...

to

...

replace

...

the

...

ones

...

included

...

in

...

the

...

Kettle

...

Mac

...

application

...

(Kettle.app/Contents/Resources/Java/libswt/osx).

...

3

...

Installation

...

Before

...

starting

...

Kettle's

...

Spoon

...

UI,

...

the

...

Knowledge

...

Flow

...

Kettle

...

plugin

...

must

...

be

...

installed

...

in

...

either

...

the

...

plugins/steps

...

directory

...

in

...

your

...

Kettle

...

distribution

...

or

...

in

...

$HOME/.kettle/plugins/steps.

...

Unpack

...

the

...

Knowledge

...

Flow

...

archive

...

and

...

copy

...

the

...

contents

...

of

...

the

...

KFDeploy

...

directory

...

to

...

a

...

new

...

subdirectory

...

of

...

$HOME/.kettle/plugins/steps.

...

Copy

...

the

...

"weka.jar"

...

file

...

from

...

your

...

Weka

...

distribution

...

to

...

the

...

same

...

subdirectory

...

of

...

$HOME/.kettle/plugins/steps.

...

The

...

Knowledge

...

Flow

...

Kettle

...

plugin

...

also

...

requires

...

a

...

small

...

plugin

...

to

...

be

...

installed

...

in

...

the

...

Weka

...

Knowledge

...

Flow

...

application.

...

This

...

plugin

...

provides

...

a

...

special

...

data

...

source

...

component

...

for

...

the

...

Weka

...

Knowledge

...

Flow

...

that

...

accepts

...

incoming

...

data

...

sets

...

from

...

Kettle.

...

Copy

...

the

...

contents

...

of

...

the

...

"KettleInject"

...

directory

...

to

...

a

...

subdirectory

...

in

...

$HOME/.knowledgeFlow/plugins.

...

If

...

the

...

$HOME/.knowledgeFlow/plugins

...

directory

...

does

...

not

...

exist,

...

you

...

will

...

need

...

to

...

create

...

it

...

manually.

...

Once

...

installed

...

correctly,

...

you

...

will

...

find

...

the

...

Kettle

...

Knowledge

...

Flow

...

step

...

in

...

the

...

"Transform"

...

folder

...

in

...

the

...

Spoon

...

user

...

interface. Image Added
 



4 Using the Knowledge Flow Plugin

As a simple example, we will use the Knowledge Flow step to create and export a predictive model for the "pendigits.csv"data set (docs/data/pendigits.csv).

...

This

...

data

...

set

...

is

...

also

...

used

...

in

...

the

...

"Using

...

the

...

Weka

...

Scoring

...

Plugin"

...

documentation.

...

4.1

...

Create

...

a

...

Simple Transformation 

First construct a simple Kettle transformation that links a CSV input step to the Knowledge Flow step. Next configure the input step to load the "pendigits.csv"

...

file.

...

Make

...

sure

...

that

...

the

...

Delimiter

...

text

...

box

...

contains

...

a

...

","

...

and

...

then

...

click

...

"Get

...

Fields"

...

to

...

make

...

the

...

CSV

...

input

...

step

...

analyze

...

a

...

few

...

lines

...

of

...

the

...

file

...

and

...

determine

...

the

...

types

...

of

...

the

...

fields. Image Added
 

All the fields in the "pendigits.csv"

...

file

...

are

...

integers.

...

However,

...

the

...

problem

...

is

...

a

...

discrete

...

classification

...

task

...

and

...

Weka

...

will

...

need

...

the

...

"class"

...

field

...

to

...

be

...

declared

...

as

...

a

...

nominal

...

attribute.

...

In

...

the

...

CSV

...

input

...

step's

...

configuration

...

dialog,

...

change

...

the

...

type

...

of

...

the

...

"class"

...

field

...

from

...

"Integer"

...

to

...

"String."

...

Image Added
 

4.2

...

Configuring

...

the

...

Knowledge

...

Flow

...

Kettle

...

Step

...

The

...

Knowledge

...

Flow

...

step's

...

configuration

...

dialog

...

is

...

made

...

up

...

of

...

three

...

tabs

...

(although

...

only

...

two

...

are

...

visible

...

initially

...

when

...

the

...

dialog

...

is

...

first

...

opened).

...

The

...

first

...

tab,

...

"KnowledgeFlow

...

file,"

...

enables

...

existing

...

Knowledge

...

Flow

...

flow

...

definition

...

files

...

to

...

be

...

loaded

...

or

...

imported

...

from

...

disk.

...

It

...

also

...

allows

...

you

...

to

...

configure

...

how

...

the

...

incoming

...

data

...

from

...

the

...

transformation

...

is

...

connected

...

to

...

the

...

Knowledge

...

Flow

...

process

...

and

...

how

...

to

...

deal

...

with

...

the

...

output.

...

If

...

a

...

flow

...

definition

...

is

...

loaded,

...

then

...

the

...

definition

...

file

...

will

...

be

...

loaded

...

(sourced)

...

from

...

the

...

disk

...

every

...

time

...

that

...

the

...

transformation

...

is

...

executed.

...

If,

...

on

...

the

...

other

...

hand,

...

the

...

the

...

flow

...

definition

...

file

...

is

...

imported,

...

it

...

will

...

be

...

stored

...

in

...

either

...

the

...

transformation's

...

XML

...

configuration

...

file

...

(.ktr

...

file)

...

or

...

the

...

repository

...

(if

...

one

...

is

...

being

...

used).

...


Image Added

A third option is to design a new Knowledge Flow process from scratch using the embedded Knowledge Flow editor. In this case the new flow definition will be stored in the .ktr file/repository. This is the approach we will take for the purposes of demonstration. Clicking the "Show embedded KnowledgeFlow editor" button will cause a new "KnowledgeFlow" tab to appear on the dialog.

Info
titleNote

You may need to enlarge the size of the Knowledge Flow step's dialog in order to fully see the embedded editor.

Image Added
To begin with, we will need an entry point into the data mining process for data from the Kettle transformation. Select the "Plugins" tab of the embedded editor and place a "KettleInject" step onto the layout canvas. If there is no "Plugins" tab visible, or there is no "KettleInject" step available from the "Plugins" tab, you will need to review the installation process described earlier. Image Added

Next, connect a "TrainingSetMaker" step to the "KettleInject" step by right clicking over "KettleInject" and selecting "dataSet" from the list of connections. Image Added

Now add a logistic regression classifier to the flow and connect it by right clicking over "TrainingSetMaker" and selecting "trainingSet" from the list of connections. Image Added

Next, connect a "SerializedModelSaver" step and connect it by right clicking over "Logistic" and selecting "batchClassifier" from the list of connections. Image Added
Now configure the "SerializedModelSaver" in order to specify a location to save the trained model to. Either double click the icon or right click over it and select "Configure..." from the pop-up menu. If you are using Weka version 3.7.x, there is support for environment variables in the Knowledge Flow and Kettle's internal environment variables are available. In the screenshot below, we are saving the trained classifier to  ${Internal.Transformation.Filename.Directory}

...

-

...

this

...

is

...

the

...

directory

...

that

...

the

...

Kettle

...

transformation

...

has

...

been

...

saved

...

to

...

(Note:

...

this

...

only

...

makes

...

sense

...

if

...

a

...

repository

...

is

...

not

...

being

...

used).

...

You

...

can

...

always

...

specify

...

an

...

absolute

...

path

...

to

...

a

...

directory

...

on

...

your

...

file

...

system,

...

and,

...

in

...

fact,

...

this

...

is

...

necessary

...

if

...

you

...

are

...

using

...

Weka

...

version

...

3.6.x.

...

Image Added