combineByKey {SparkR}R Documentation

Combine values by key

Description

Generic function to combine the elements for each key using a custom set of aggregation functions. Turns an RDD[(K, V)] into a result of type RDD[(K, C)], for a "combined type" C. Note that V and C can be different – for example, one might group an RDD of type (Int, Int) into an RDD of type (Int, Seq[Int]). Users provide three functions:

Usage

combineByKey(rdd, createCombiner, mergeValue, mergeCombiners, numPartitions)

## S4 method for signature 'RDD,ANY,ANY,ANY,integer'
combineByKey(rdd, createCombiner,
  mergeValue, mergeCombiners, numPartitions)

Arguments

rdd

The RDD to combine. Should be an RDD where each element is list(K, V) or c(K, V).

createCombiner

Create a combiner (C) given a value (V)

mergeValue

Merge the given value (V) with an existing combiner (C)

mergeCombiners

Merge two combiners and return a new combiner

numPartitions

Number of partitions to create.

Value

An RDD where each element is list(K, C) where C is the combined type

See Also

groupByKey, reduceByKey

Examples

## Not run: 
##D sc <- sparkR.init()
##D pairs <- list(list(1, 2), list(1.1, 3), list(1, 4))
##D rdd <- parallelize(sc, pairs)
##D parts <- combineByKey(rdd, function(x) { x }, "+", "+", 2L)
##D combined <- collect(parts)
##D combined[[1]] # Should be a list(1, 6)
## End(Not run)

[Package SparkR version 0.1 Index]