combineByKey {SparkR} | R Documentation |
Generic function to combine the elements for each key using a custom set of aggregation functions. Turns an RDD[(K, V)] into a result of type RDD[(K, C)], for a "combined type" C. Note that V and C can be different – for example, one might group an RDD of type (Int, Int) into an RDD of type (Int, Seq[Int]). Users provide three functions:
createCombiner, which turns a V into a C (e.g., creates a one-element list)
mergeValue, to merge a V into a C (e.g., adds it to the end of a list) -
mergeCombiners, to combine two C's into a single one (e.g., concatentates two lists).
combineByKey(rdd, createCombiner, mergeValue, mergeCombiners, numPartitions) ## S4 method for signature 'RDD,ANY,ANY,ANY,integer' combineByKey(rdd, createCombiner, mergeValue, mergeCombiners, numPartitions)
rdd |
The RDD to combine. Should be an RDD where each element is list(K, V) or c(K, V). |
createCombiner |
Create a combiner (C) given a value (V) |
mergeValue |
Merge the given value (V) with an existing combiner (C) |
mergeCombiners |
Merge two combiners and return a new combiner |
numPartitions |
Number of partitions to create. |
An RDD where each element is list(K, C) where C is the combined type
groupByKey, reduceByKey
## Not run:
##D sc <- sparkR.init()
##D pairs <- list(list(1, 2), list(1.1, 3), list(1, 4))
##D rdd <- parallelize(sc, pairs)
##D parts <- combineByKey(rdd, function(x) { x }, "+", "+", 2L)
##D combined <- collect(parts)
##D combined[[1]] # Should be a list(1, 6)
## End(Not run)