sampleRDD {SparkR} | R Documentation |
The same as ‘sample()’ in Spark. (We rename it due to signature inconsistencies with the ‘sample()’ function in R's base package.)
sampleRDD(rdd, withReplacement, fraction, seed) ## S4 method for signature 'RDD,logical,numeric,integer' sampleRDD(rdd, withReplacement, fraction, seed)
rdd |
The RDD to sample elements from |
withReplacement |
Sampling with replacement or not |
fraction |
The (rough) sample target fraction |
seed |
Randomness seed value |
## Not run:
##D sc <- sparkR.init()
##D rdd <- parallelize(sc, 1:10) # ensure each num is in its own split
##D collect(sampleRDD(rdd, FALSE, 0.5, 1618L)) # ~5 distinct elements
##D collect(sampleRDD(rdd, TRUE, 0.5, 9L)) # ~5 elements possibly with duplicates
## End(Not run)