R: Partition an RDD by key

partitionBy {SparkR}

R Documentation

Partition an RDD by key

Description

This function operates on RDDs where every element is of the form list(K, V) or c(K, V). For each element of this RDD, the partitioner is used to compute a hash function and the RDD is partitioned using this hash value.

Usage

partitionBy(rdd, numPartitions, ...)

## S4 method for signature 'RDD,integer'
partitionBy(rdd, numPartitions,
  partitionFunc = hashCode)

Arguments

`rdd`	The RDD to partition. Should be an RDD where each element is list(K, V) or c(K, V).
`numPartitions`	Number of partitions to create.
`...`	Other optional arguments to partitionBy.
`partitionFunc`	The partition function to use. Uses a default hashCode function if not provided

Value

An RDD partitioned using the specified partitioner.

Examples

## Not run: 
##D sc <- sparkR.init()
##D pairs <- list(list(1, 2), list(1.1, 3), list(1, 4))
##D rdd <- parallelize(sc, pairs)
##D parts <- partitionBy(rdd, 2L)
##D collectPartition(parts, 0L) # First partition should contain list(1, 2) and list(1, 4)
## End(Not run)

[Package SparkR version 0.1 Index]