distinct {SparkR}R Documentation

Removes the duplicates from RDD.

Description

This function returns a new RDD containing the distinct elements in the given RDD. The same as ‘distinct()’ in Spark.

Usage

distinct(rdd, numPartitions)

## S4 method for signature 'RDD,missingOrInteger'
distinct(rdd, numPartitions)

Arguments

rdd

The RDD to remove duplicates from.

numPartitions

Number of partitions to create.

Examples

## Not run: 
##D sc <- sparkR.init()
##D rdd <- parallelize(sc, c(1,2,2,3,3,3))
##D sort(unlist(collect(distinct(rdd)))) # c(1, 2, 3)
## End(Not run)

[Package SparkR version 0.1 Index]