Hitachi Vantara Pentaho Community Wiki

Versions Compared


  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin




Step name

Name of this step as it appears in the transformation workspace

Host name(s) or IP address(es)

Indicates the network name or address of the MongoDB instance or instances. You can input multiple host names or IP addresses, separated by a comma. You can also specify a different port number for each host name by separating the host name and port number with a colon, and separating each combination of host name and port number with a comma. For example, to include the host name and port number for two different MongoDB instances, you would input localhost1:27017,localhost2:27018 and leave the Port field empty.


Indicates the port number of the MongoDB instance or instances. Use this to specify a default port if no ports are given as part of the Host name(s) or IP address(es) field.

Use all replica set members/mongos

Differentiates between a replica set containing one node and a stand-alone single Mongo host. If there is a replica set, and it contains more than one host, then the Java driver discovers all hosts automatically. It is good practice to list more than one replica set host in the hosts field so that the driver has a better chance of connecting successfully if one is down.


Indicates the user name required to access the database. If you want to use Kerberos authentication, enter the Kerberos principal in this field.  If you do not know the principal, contact your system administrator.  The principal is the unique identity to which Kerberos assigns tickets.  When you enter the principal as the username, it should be formatted like this: <primary>/<instance>@<KERBEROS_REALM> is typically the name of the user.  If the primary is a host, the primary is typically the word host.  <instance> qualifies the primary.  Sometimes if the primary is a user, the instance is the username of the database administrator.  <KERBEROS_REALM> is the Kerberos realm (domain name).  Note that the <KERBEROS_REALM> is case sensitive.   Here is an example of a correctly-formatted Kerberos principal username: <joe/admin@CORPORATION.COM>.


Indicates the password associated with the provided Username. If you are using Kerberos authentication, you do not need to enter the password.

Authenticate using Kerberos

Indicates whether to use the Kerberos service to manage the authentication process. If you choose this option, read Use Kerberos Authentication to Provide Spoon Users Access to MongoDBfor configuration information.

Connection timeout

Designates how long to wait for a connection to a database (in milliseconds) before terminating the connection attempt. Leave blank to never terminate the connection.

Socket timeout

Designates how long to wait for a write operation (in milliseconds) before terminating the operation. Leave blank to never terminate the operation.





Name of the database to write data to. Click Get DBs to populate the drop-down menu with a list of databases on the server.


Name of the collection to write data to. Click Get collections to populate the drop-down menu with a list of collections within the database.

Batch insert size

Sets the batch size for fast bulk insert operations. If left blank, the default size is 100 rows.

Truncate collection

Deletes any existing data in the target collection before inserting begins.


Changes the write mode from insert to upsert, which either updates the first document matched in the target collection or, if no document matches, inserts a new document into the target collection according to the incoming fields specified in the Mongo document fields tab.


Updates all matching documents, rather than just the first.

Modifier update

Enables modifier operators to be used to modify individual fields within matching documents. To set the Modifier operationsee the Mongo document fields tab.

Write concern (w option)

{+} the minimum number of servers that must succeed for a write operation. A value of -1 disables all acknowledgement of write operation errors. Zero (0) disables basic acknowledgment of write operations, but returns information about socket excepts and networking errors. 1 provides acknowledgment of write operations on the primary node. >1 waits for successful write operations to the specified number of slaves, including the primary. 

w Time out

Designates how long to wait for a response to write operations (in milliseconds) before terminating the operation. Leave blank to never terminate.

Journaled writes

Writes the operation to the journal first, and after to the core data files. This confirms the write operation can survive a shutdown and ensures the write operation is durable.

Read preference

Indicates which node to read first—Primary, Primary preferred, Secondary, Secondary preferred, or Nearest

Number of retries for write operations

Indicates the number of times that a write operation is attempted.

Delay, in seconds, between retry attempts

Indicates the number of seconds between write operation retry attempts.





The order of this field in the list.

Index fields

Specifies a single index (using one field) or a compound index (using multiple fields). The . (dot) notation is used to specify a path to a field to use in the index. This path can be optionally postfixed by a direction indicator, :1 for ascending or :-1 for descending. Compound indexes are specified by a comma-separated list of paths.

Index opp

Specifies whether the index is created or dropped.


Indicates whether to display entries for documents that have a duplicate value for the indexed field.


Indicates whether the index should contain only entries fro those documents that have a value in the indexed field.

Show indexes

Displays the index information available.

Further Reading
See the Big Data MongoDB Tutorials, or MongoDB Outputsection of the Pentaho Wiki for scenario-based examples of working with MongoDB and Pentaho.