Hitachi Vantara Pentaho Community Wiki
Child pages
  • Include Transforming Data within Hive

Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migration of unmigrated content due to installation of a new plugin

...

  1. Open the Hive Shell: Open the Hive shell so you can manually create a Hive table by entering 'hive' at the command line.

  2. Create the Table in Hive: You need a hive table to load the data to, so enter the following in the hive shell.
    Code Block
    create table weblogs (
        client_ip    string,
        full_request_date string,
        day    string,
        month    string,
        month_num int,
        year    string,
        hour    string,
        minute    string,
        second    string,
        timezone    string,
        http_verb    string,
        uri    string,
        http_status_code    string,
        bytes_returned        string,
        referrer        string,
        user_agent    string)
    row format delimited
    fields terminated by '\t';
    
  3. Close the Hive Shell: You are done with the Hive Shell for now, so close it by entering 'quit;' in the Hive Shell.

  4. Load the Table: Load the Hive table by running the following commands:
    Code Block
    hadoop fs –cp /weblogs/parse/part-00000 /user/hive/warehouse/weblogs/
    
    

...

Include Page
Create

...

Hive

...

Database Connection

...

Create

...

Hive

...

Database Connection

...

...

Create a

...

  1. Connection Name: Enter 'Hive'
  2. Connection Type: Select 'Hadoop Hive'
  3. Host Name and Port Number: Your connection information. For local single node clusters use 'localhost' and port '10000'.
  4. Database Name: Enter 'Default'
    When you are done your window should look like:
    Image Removed
    Click 'Test' to test the connection.
    If the test is successful click 'OK' to close the Database Connection window.

Create a Job to Aggregate Web Log Data into a Hive Table

...

  1. Open the Hive Shell: Open the Hive shell so you can manually create a Hive table by entering 'hive' at the command line.
  2. Query Hive for Data: Verify the data has been loaded to Hive by querying the weblogs table.
    Code Block
    select * from weblogs_agg limit 10;
    
    


  3. Close the Hive Shell: You are done with the Hive Shell for now, so close it by entering 'quit;' in the Hive Shell.

...

During this guide you learned how to transform data within Hive within a PDI job flow.

Wiki Markup
{scrollbar}