Hitachi Vantara Pentaho Community Wiki
Child pages
  • Read Data From MongoDB
Skip to end of metadata
Go to start of metadata

How to read data from a collection in MongoDB.By the end of this guide you should understand how data can be read from MongoDB and written to many places. The data we are going to use contains data about the flow of visitors to a web site.

Intro Video


In order follow along with this how-to guide you will need the following:


A single-node local cluster is sufficient for these exercises but a larger and/or remote configuration will work as well. You will need to know the address and port that MongoDB is running on and have a user id and password for the server (if applicable).
These guides were developed using the MongoDB version 2.0.2. You can find MongoDB downloads here:


A desktop installation of the Kettle design tool called 'Spoon'. Download here.


To follow this guide you need to have a populated MongoDB collection. If you do not have any data in MongoDB yet you can use the Write Data To MongoDB guide to add some data to your MongoDB installation. The instructions in this guide assume that the demo data set is available in a collection called PageSuccessions in a database called Demo.

Step-By-Step Instructions


Start MongoDB if is not running.

Create a Data Transformation

Start Spoon on your desktop. Once it is running choose 'File' -> 'New' -> 'Transformation' from the menu system or click on the 'New file' icon on the toolbar and choose the 'Transformation' option.

Speed Tip

You can download the Kettle Transform read_from_mongodb.ktr already completed

  1. Add a MongoDB Input Step: We are going to read data from MongoDB, so expand the 'Big Data section of the Design palette and drag a 'MongoDb Input' step onto the transformation canvas.
  2. Edit the MongoDb Input Step: Double-click on the 'MongoDb Input' step to edit its properties. Enter this information:
    1. Host name, Port, Authentication user and password: the connection information for your MongoDB installation.
    2. Database: 'Demo' or another database if you want.
    3. Collection: 'PageSuccessions'
    4. Query expression: { "$query" : { "url" : "-firstpage-" }, "$orderby" : { "Count" : -1 } }
      The window should look like this:

      Click 'OK' to close the 'MongoDB Input' window.
  3. Preview the Data: Click on the Preview toolbar button (the green arrow with the magnifying glass ) or right-click on the step and choose 'Preview'. The 'Transformation debug dialog' will open. Click on 'Quick Launch'. You will should see the data returned by the 'MongoDB Input' step.

    Congratulations! You've read data from MongoDB. Close the preview window.
  4. Add a JSON Input Step: The data from the 'MongoDB Input' step has a JSON document in each row. To work with the data in these documents we need to use a 'JSON Input' step to extract the fields that we are interested in. Expand the 'Input' section and drag a 'JSON Input' step onto the canvas.
  5. Connect the MongoDB and JSON Steps: Hover the mouse over the 'MongoDb Input' step and a tooltip will appear. Click on the output connector (the green arrow pointing to the right) and drag a connector arrow to the 'Json Input' step. Your canvas should look like this:
  6. Edit the JSON Step: Double-click on the 'Json Input' step to edit it's properties. On the 'File' tab choose 'json' from the 'Get source from field' dropdown list.

    On the 'Fields' tab, enter this information: (Note the Path is case sensitive.)

    Click 'OK' to close the 'Json input' window.
  7. Preview the JSON Input: Preview the 'Json Input' step as we did about with the 'MongoDb Input' step. You will see the JSON documents as before, but if you scroll the table to the right you will see new fields for URL, NextURL, and Count.
  8. Add an Output Step: Expand the 'Output' section of the design palette. You can see that there are different output options – files, databases, and applications. There are more output options in the 'Bulk loading' section. For this example we will write to a text file, but you can experiment to other output destinations if you want. Drag a 'Text file output' step from the palette onto the canvas. Connect the 'Json Input' step to the 'Text file output' step, choosing the 'Main output of step' option.
  9. Edit the Text File Output Step: Double click on the 'Text file output' step to edit its properties. Click on the 'Browse' button to select a destination for the file. Select a destination for the file by click.
  10. Define the Output Fields: Click on the 'Fields' tab, then click on the 'Get Fields' button. The table of fields will be populated based on the metadata of the fields coming out of the 'Json Input' step. Click on the 'json' row to highlight it and press the delete key or right-click on the 'json' row and choose 'Delete selected lines'.

    Click on 'OK' to close the 'Text file input' window.
  11. Save the Transformation: Choose 'File' -> 'Save as...' from the menu system. Save the transformation as 'read_from_mongodb.ktr' into a folder of your choice.
  12. Run the Transformation: Choose 'Action' -> 'Run' from the menu system or click on the green run button on the transformation toolbar. A 'Execute a transformation' window will open. Click on the 'Launch' button. An 'Execution Results' panel will open at the bottom of the PDI window and it will show you the progress of the transformation as it runs. After a few seconds the transformation should finish successfully:
    If any errors occurred the transformation step that failed will be highlighted in red and you can use the 'Logging' tab to view error messages.

    Check The Results

  13. If your transformation ran successfully you can open the text file you created to see the data written there.


During this guide you learned how to read data from a MongoDB collection and write it to a text file using PDI's graphical design tool. You can use can use this procedure to read data from MongoDB and write it to many different destinations.
Other guides in this series cover to sort and group MongoDB data, create reports, and combine data from MongoDB with data from other sources.