R: Removes the duplicates from RDD.

distinct {SparkR}

R Documentation

Removes the duplicates from RDD.

This function returns a new RDD containing the distinct elements in the given RDD. The same as ‘distinct()’ in Spark.

distinct(rdd, numPartitions)

## S4 method for signature 'RDD,missingOrInteger'
distinct(rdd, numPartitions)

`rdd`	The RDD to remove duplicates from.
`numPartitions`	Number of partitions to create.

## Not run: 
##D sc <- sparkR.init()
##D rdd <- parallelize(sc, c(1,2,2,3,3,3))
##D sort(unlist(collect(distinct(rdd)))) # c(1, 2, 3)
## End(Not run)

[Package SparkR version 0.1 Index]