repartition {SparkR} | R Documentation |
Return a new RDD that has exactly numPartitions partitions. Can increase or decrease the level of parallelism in this RDD. Internally, this uses a shuffle to redistribute data. If you are decreasing the number of partitions in this RDD, consider using coalesce, which can avoid performing a shuffle.
repartition(x, numPartitions) ## S4 method for signature 'RDD,integer' repartition(x, numPartitions)
x |
The RDD. |
numPartitions |
Number of partitions to create. |
coalesce
## Not run:
##D sc <- sparkR.init()
##D rdd <- parallelize(sc, list(1, 2, 3, 4, 5, 6, 7), 4L)
##D numPartitions(rdd) # 4
##D numPartitions(repartition(rdd, 2L)) # 2
## End(Not run)