repartition {SparkR}R Documentation

Return a new RDD that has exactly numPartitions partitions. Can increase or decrease the level of parallelism in this RDD. Internally, this uses a shuffle to redistribute data. If you are decreasing the number of partitions in this RDD, consider using coalesce, which can avoid performing a shuffle.

Description

Return a new RDD that has exactly numPartitions partitions. Can increase or decrease the level of parallelism in this RDD. Internally, this uses a shuffle to redistribute data. If you are decreasing the number of partitions in this RDD, consider using coalesce, which can avoid performing a shuffle.

Usage

repartition(x, numPartitions)

## S4 method for signature 'RDD,integer'
repartition(x, numPartitions)

Arguments

x

The RDD.

numPartitions

Number of partitions to create.

See Also

coalesce

Examples

## Not run: 
##D sc <- sparkR.init()
##D rdd <- parallelize(sc, list(1, 2, 3, 4, 5, 6, 7), 4L)
##D numPartitions(rdd)                   # 4
##D numPartitions(repartition(rdd, 2L))  # 2
## End(Not run)

[Package SparkR version 0.1 Index]