Hitachi Vantara Pentaho Community Wiki
Skip to end of metadata
Go to start of metadata

Step 설명

이 Step은 구분자로 필드가 구분된 파일에서 데이터를 읽습니다. Text File Input step보다 전체적인 기능은 적지만 몇가지 특별한 기능을 가지고 있습니다:

  • NIO -- 파일을 읽을 때 네이티브 시스템 콜을 사용하여 성능이 뛰어납니다 그러나 현재는 로컬 파일만 읽을 수 있고, VFS는 지원하지 않습니다.
  • Parallel running -- If you configure this step to run in multiple copies or in clustered mode, and you enable parallel running, each copy will read a separate block of a single file allowing you to distribute the file reading to several threads or even several slave nodes in a clustered transformation.
  • Lazy conversion -- If you will be reading many fields from the file and many of those fields will not be manipulate, but merely passed through the transformation to land in some other text file or a database, lazy conversion can prevent Kettle from performing unnecessary work on those fields such as converting them into objects such as strings, dates, or numbers. 


The table below describes the options available for the CSV Input step:



Step 이름

Step의 이름

파일이름 필드 (이전 Step의 데이터)

Specify the name of the CSV file to read from.
Select the fieldname that will contain the filename(s) to read from.
If this step receives data from a previous step, this option is enabled as well as the option to include the filename in the output.


Specify the file delimiter character used in the target file.


Specify the enclosure character used in the target file.

NIO 버퍼 크기

This is the size of the read buffer.  It represents the amount of bytes that is read in one time from disk.

Lazy conversion

The lazy conversion algorithm will try to avoid unnecessary data type conversions and can result in a significant performance improvements if this is possible.  The typical example that comes to mind is reading from a text file and writing back to a text file.


Enable this option if the target file contains a header row containing column names.

결과에 파일이름 추가

Adds the CSV filename(s) read to the result of this transformation.  A unique list is being kept in memory that can be used in the next job entry in a job, for example in another transformation.

로우 번호 필드 (optional)

The name of the Integer field that will contain the row number in the output of this step.

병렬로 실행?

Check this box if you will have multiple instances of this step running (step copies) and if you want each instance to read a separate part of the CSV file(s). 
When reading multiple files, the total size of all files is taken into consideration to split the workload. In that specific case, make sure that ALL step copies receive all files that need to be read, otherwise, the parallel algorithm will not work correctly (for obvious reasons).

WARNING: For technical reasons, parallel reading of CSV files is only supported on files that don't have fields with line breaks or carriage returns in them.

파일 인코딩

Specify the encoding of the file being read.

필드 테이블

This table contains an ordered list of fields to be read from the target file.

미리보기 버튼

Click to preview the data coming from the target file.

필드 가져오기 버튼

Click to return a list of fields from the target file based on the current settings (i.e. Delimiter, Enclosure, etc.). All fields identified will be added to the Fields Table.

  • No labels