The reservoir sampling step allows you to sample a fixed number of rows = from an incoming data stream when the total number of incoming rows is not = known in advance. The step uses uniform sampling; all incoming rows have an= equal chance of being selected. This step is particularly useful when used= in conjunction with the ARFF output step in order to generate a suitable s= ized data set to be used by WEKA. The reservoir sampling step uses Algorithm R by Jeffery Vitter.

=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20
=20

=20
Option | Description |
---|---|

Step name | The name of this step as it appears in the t= ransformation workspace. |

Sample size | Select how many rows to sample from an incom= ing stream. Setting a value of 0 will cause all rows to be sampled; setting= a negative value will block all rows. |

Random seed | Choose a seed for the random number generato= r. Repeating a transformation with a different value for the seed will resu= lt in a different random sample being chosen. |

Vitter, J. S. Random Sampling with a Reservoir. ACM Transactions on Math= ematical Software, Vol. 11, No. 1, March 1985. Pages 37-57.