Hitachi Vantara Pentaho Community Wiki
Child pages
  • Vertica Bulk Loader
Skip to end of metadata
Go to start of metadata

Description

This step allows you to load data into a Vertica database table. It uses the VerticaCopyStream class in the Vertica JDBC driver (documented on the Vertica site in the Programmer's Guide > Vertica Client Libraries). Using the VerticaCopyStream class with a COPY statement is the highest performance method offered by Vertica to stream bulk data into a table.

Prerequisites

  • The Vertica JDBC driver must be in the PDI classpath.
  • You must be connecting to a Vertica cluster version 5 or higher.  Both Vertica CE and Vertica EE are supported.
  • The host machine running the transformation using this step must be able to connect to all nodes of the Vertica cluster via TCP.
  • The Vertica user configured in the connection must either be a superuser, or have USAGE privilege on the schema and INSERT privilege on the table.

Options

Option

Description

Step name

The name of this step as it appears in the transformation workspace.

Connection

Drop-down menu which allows you to select the appropriate Vertica database connection. Note: List is populated only once a database connection has already been established; if no database connection has been established, this list will be blank. You must select a connection with the type "Vertica 5+".

Target schema

The name of the schema containing the table in which the data will be inserted.

Target table

Name of the target table.

Specify database fields

If this option is selected, the user can indicate the specific input fields to be inserted in the table as well as provide a mapping to the column name in the table. 
If this option is not selected, every input field will be inserted, using the exact name of the field as the column name.

Insert direct to ROS

This option determines whether the data will be loaded using the WOS (if the option is not selected) or ROS (if the option is selected).
Vertica supports three data load methods, however, only WOS and ROS are supported by this bulk loader.  Please see the Vertica Administrator's Guide > Bulk Loading Data > Choosing a Load Method for details on these options.

  • WOS (AUTO) -- Inserting to WOS is good for smaller bulk loads (< 100MB), and if Vertica's WOS is full, the COPY will seamlessly continue loading directly to ROS containers.
  • ROS (DIRECT) -- Inserting to ROS is good for larger bulk loads and using it appropriately will have less overhead impact on the cluster and lead to better query performance.  If ROS is used for many loads of smaller data sets, this will create many ROS containers which will have to be combined by Vertica later, and could lead to cluster errors if there are more ROS containers than the cluster's configuration allows.

Abort on error

If this option is selected, any rows of data which are rejected by Vertica will cause an error which will abort the entire transformation, potentially rolling back the current transaction.
If this option is not selected, the rejected rows will be captured by Vertica in a log file indicated below and the transformation will continue resulting in a partial load of the data.

Exceptions log file

Specifies the filename or absolute path for the load exceptions file.  In Vertica 6 and higher, the path can refer to a storage location.  Please see the Vertica SQL Reference Manual > SQL Statements > COPY for details on the required privileges.

Rejected data log file

Specifies the filename or absolute path in which to write rejected rows.  In Vertica 6 and higher, the path can refer to a storage location.  Please see the Vertica SQL Reference Manual > SQL Statements > COPY for details on the required privileges.

Stream name

If this option is specified, the string will be used as the COPY load stream identifier which can be seen when monitoring load streams via the LOAD_STREAMS system table.  The default stream name is <table_name>_STDIN

Database fields

If the "Specify database fields" option is selected, this table is enabled.  With the "Get fields" button you can load the fields from the stream. With the "Enter field mapping" button you can edit the mapping from the stream to the table fields when the table already exists.

Option

Description

Table field

Name of the column in the table to store this field.

Stream field

Name of the field in the stream.

SQL button

The user can press the SQL button to review and optionally execute the CREATE TABLE and CREATE PROJECTION statements that match the data coming into this step.  Please note that the projection is created using no segmentation or partitioning which will result in a full copy of the data residing on every node on the Vertica cluster.  This is useful for testing, but the Vertica database administrator should revise the table's projections before being used in production.

Metadata Injection Support (7.x and later)

All fields of this step support metadata injection. You can use this step with ETL Metadata Injection to pass metadata to your transformation at runtime.

  • No labels