sampleRDD {SparkR}R Documentation

Return an RDD that is a sampled subset of the given RDD.

Description

The same as ‘sample()’ in Spark. (We rename it due to signature inconsistencies with the ‘sample()’ function in R's base package.)

Usage

sampleRDD(rdd, withReplacement, fraction, seed)

## S4 method for signature 'RDD,logical,numeric,integer'
sampleRDD(rdd, withReplacement,
  fraction, seed)

Arguments

rdd

The RDD to sample elements from

withReplacement

Sampling with replacement or not

fraction

The (rough) sample target fraction

seed

Randomness seed value

Examples

## Not run: 
##D sc <- sparkR.init()
##D rdd <- parallelize(sc, 1:10) # ensure each num is in its own split
##D collect(sampleRDD(rdd, FALSE, 0.5, 1618L)) # ~5 distinct elements
##D collect(sampleRDD(rdd, TRUE, 0.5, 9L)) # ~5 elements possibly with duplicates
## End(Not run)

[Package SparkR version 0.1 Index]