R: Combine values by key

combineByKey {SparkR}

R Documentation

Combine values by key

Description

Generic function to combine the elements for each key using a custom set of aggregation functions. Turns an RDD[(K, V)] into a result of type RDD[(K, C)], for a "combined type" C. Note that V and C can be different – for example, one might group an RDD of type (Int, Int) into an RDD of type (Int, Seq[Int]). Users provide three functions:

createCombiner, which turns a V into a C (e.g., creates a one-element list)
mergeValue, to merge a V into a C (e.g., adds it to the end of a list) -
mergeCombiners, to combine two C's into a single one (e.g., concatentates two lists).

Usage

combineByKey(rdd, createCombiner, mergeValue, mergeCombiners, numPartitions)

## S4 method for signature 'RDD,ANY,ANY,ANY,integer'
combineByKey(rdd, createCombiner,
  mergeValue, mergeCombiners, numPartitions)

Arguments

`rdd`	The RDD to combine. Should be an RDD where each element is list(K, V) or c(K, V).
`createCombiner`	Create a combiner (C) given a value (V)
`mergeValue`	Merge the given value (V) with an existing combiner (C)
`mergeCombiners`	Merge two combiners and return a new combiner
`numPartitions`	Number of partitions to create.

Value

An RDD where each element is list(K, C) where C is the combined type

Examples

## Not run: 
##D sc <- sparkR.init()
##D pairs <- list(list(1, 2), list(1.1, 3), list(1, 4))
##D rdd <- parallelize(sc, pairs)
##D parts <- combineByKey(rdd, function(x) { x }, "+", "+", 2L)
##D combined <- collect(parts)
##D combined[[1]] # Should be a list(1, 6)
## End(Not run)