distinct {SparkR} | R Documentation |
This function returns a new RDD containing the distinct elements in the given RDD. The same as ‘distinct()’ in Spark.
distinct(rdd, numPartitions) ## S4 method for signature 'RDD,missingOrInteger' distinct(rdd, numPartitions)
rdd |
The RDD to remove duplicates from. |
numPartitions |
Number of partitions to create. |
## Not run:
##D sc <- sparkR.init()
##D rdd <- parallelize(sc, c(1,2,2,3,3,3))
##D sort(unlist(collect(distinct(rdd)))) # c(1, 2, 3)
## End(Not run)