partitionBy {SparkR}R Documentation

Partition an RDD by key

Description

This function operates on RDDs where every element is of the form list(K, V) or c(K, V). For each element of this RDD, the partitioner is used to compute a hash function and the RDD is partitioned using this hash value.

Usage

partitionBy(rdd, numPartitions, ...)

## S4 method for signature 'RDD,integer'
partitionBy(rdd, numPartitions,
  partitionFunc = hashCode)

Arguments

rdd

The RDD to partition. Should be an RDD where each element is list(K, V) or c(K, V).

numPartitions

Number of partitions to create.

...

Other optional arguments to partitionBy.

partitionFunc

The partition function to use. Uses a default hashCode function if not provided

Value

An RDD partitioned using the specified partitioner.

Examples

## Not run: 
##D sc <- sparkR.init()
##D pairs <- list(list(1, 2), list(1.1, 3), list(1, 4))
##D rdd <- parallelize(sc, pairs)
##D parts <- partitionBy(rdd, 2L)
##D collectPartition(parts, 0L) # First partition should contain list(1, 2) and list(1, 4)
## End(Not run)

[Package SparkR version 0.1 Index]