partitionBy {SparkR} | R Documentation |
This function operates on RDDs where every element is of the form list(K, V) or c(K, V). For each element of this RDD, the partitioner is used to compute a hash function and the RDD is partitioned using this hash value.
partitionBy(rdd, numPartitions, ...) ## S4 method for signature 'RDD,integer' partitionBy(rdd, numPartitions, partitionFunc = hashCode)
rdd |
The RDD to partition. Should be an RDD where each element is list(K, V) or c(K, V). |
numPartitions |
Number of partitions to create. |
... |
Other optional arguments to partitionBy. |
partitionFunc |
The partition function to use. Uses a default hashCode function if not provided |
An RDD partitioned using the specified partitioner.
## Not run:
##D sc <- sparkR.init()
##D pairs <- list(list(1, 2), list(1.1, 3), list(1, 4))
##D rdd <- parallelize(sc, pairs)
##D parts <- partitionBy(rdd, 2L)
##D collectPartition(parts, 0L) # First partition should contain list(1, 2) and list(1, 4)
## End(Not run)