groupByKey {SparkR} | R Documentation |
This function operates on RDDs where every element is of the form list(K, V) or c(K, V). and group values for each key in the RDD into a single sequence.
groupByKey(rdd, numPartitions) ## S4 method for signature 'RDD,integer' groupByKey(rdd, numPartitions)
rdd |
The RDD to group. Should be an RDD where each element is list(K, V) or c(K, V). |
numPartitions |
Number of partitions to create. |
An RDD where each element is list(K, list(V))
reduceByKey
## Not run:
##D sc <- sparkR.init()
##D pairs <- list(list(1, 2), list(1.1, 3), list(1, 4))
##D rdd <- parallelize(sc, pairs)
##D parts <- groupByKey(rdd, 2L)
##D grouped <- collect(parts)
##D grouped[[1]] # Should be a list(1, list(2, 4))
## End(Not run)