aggregateByKey {SparkR}R Documentation

Aggregate a pair RDD by each key.

Description

Aggregate the values of each key in an RDD, using given combine functions and a neutral "zero value". This function can return a different result type, U, than the type of the values in this RDD, V. Thus, we need one operation for merging a V into a U and one operation for merging two U's, The former operation is used for merging values within a partition, and the latter is used for merging values between partitions. To avoid memory allocation, both of these functions are allowed to modify and return their first argument instead of creating a new U.

Usage

aggregateByKey(rdd, zeroValue, seqOp, combOp, numPartitions)

## S4 method for signature 'RDD,ANY,ANY,ANY,integer'
aggregateByKey(rdd, zeroValue, seqOp,
  combOp, numPartitions)

Arguments

rdd

An RDD.

zeroValue

A neutral "zero value".

seqOp

A function to aggregate the values of each key. It may return a different result type from the type of the values.

combOp

A function to aggregate results of seqOp.

Value

An RDD containing the aggregation result.

See Also

foldByKey, combineByKey

Examples

## Not run: 
##D sc <- sparkR.init()
##D rdd <- parallelize(sc, list(list(1, 1), list(1, 2), list(2, 3), list(2, 4)))
##D zeroValue <- list(0, 0)
##D seqOp <- function(x, y) { list(x[[1]] + y, x[[2]] + 1) }
##D combOp <- function(x, y) { list(x[[1]] + y[[1]], x[[2]] + y[[2]]) }
##D aggregateByKey(rdd, zeroValue, seqOp, combOp, 2L)
##D   # list(list(1, list(3, 2)), list(2, list(7, 2)))
## End(Not run)

[Package SparkR version 0.1 Index]