join {SparkR}R Documentation

Join two RDDs

Description

join This function joins two RDDs where every element is of the form list(K, V). The key types of the two RDDs should be the same.

leftouterjoin This function left-outer-joins two RDDs where every element is of the form list(K, V). The key types of the two RDDs should be the same.

rightouterjoin This function right-outer-joins two RDDs where every element is of the form list(K, V). The key types of the two RDDs should be the same.

fullouterjoin This function full-outer-joins two RDDs where every element is of the form list(K, V). The key types of the two RDDs should be the same.

Usage

join(rdd1, rdd2, numPartitions)

## S4 method for signature 'RDD,RDD,integer'
join(rdd1, rdd2, numPartitions)

leftOuterJoin(rdd1, rdd2, numPartitions)

## S4 method for signature 'RDD,RDD,integer'
leftOuterJoin(rdd1, rdd2, numPartitions)

rightOuterJoin(rdd1, rdd2, numPartitions)

## S4 method for signature 'RDD,RDD,integer'
rightOuterJoin(rdd1, rdd2, numPartitions)

fullOuterJoin(rdd1, rdd2, numPartitions)

## S4 method for signature 'RDD,RDD,integer'
fullOuterJoin(rdd1, rdd2, numPartitions)

Arguments

rdd1

An RDD to be joined. Should be an RDD where each element is list(K, V).

rdd2

An RDD to be joined. Should be an RDD where each element is list(K, V).

numPartitions

Number of partitions to create.

rdd1

An RDD to be joined. Should be an RDD where each element is list(K, V).

rdd2

An RDD to be joined. Should be an RDD where each element is list(K, V).

numPartitions

Number of partitions to create.

rdd1

An RDD to be joined. Should be an RDD where each element is list(K, V).

rdd2

An RDD to be joined. Should be an RDD where each element is list(K, V).

numPartitions

Number of partitions to create.

rdd1

An RDD to be joined. Should be an RDD where each element is list(K, V).

rdd2

An RDD to be joined. Should be an RDD where each element is list(K, V).

numPartitions

Number of partitions to create.

Value

a new RDD containing all pairs of elements with matching keys in two input RDDs.

For each element (k, v) in rdd1, the resulting RDD will either contain all pairs (k, (v, w)) for (k, w) in rdd2, or the pair (k, (v, NULL)) if no elements in rdd2 have key k.

For each element (k, w) in rdd2, the resulting RDD will either contain all pairs (k, (v, w)) for (k, v) in rdd1, or the pair (k, (NULL, w)) if no elements in rdd1 have key k.

For each element (k, v) in rdd1 and (k, w) in rdd2, the resulting RDD will contain all pairs (k, (v, w)) for both (k, v) in rdd1 and and (k, w) in rdd2, or the pair (k, (NULL, w))/(k, (v, NULL)) if no elements in rdd1/rdd2 have key k.

Examples

## Not run: 
##D sc <- sparkR.init()
##D rdd1 <- parallelize(sc, list(list(1, 1), list(2, 4)))
##D rdd2 <- parallelize(sc, list(list(1, 2), list(1, 3)))
##D join(rdd1, rdd2, 2L) # list(list(1, list(1, 2)), list(1, list(1, 3))
## End(Not run)
## Not run: 
##D sc <- sparkR.init()
##D rdd1 <- parallelize(sc, list(list(1, 1), list(2, 4)))
##D rdd2 <- parallelize(sc, list(list(1, 2), list(1, 3)))
##D leftOuterJoin(rdd1, rdd2, 2L)
##D # list(list(1, list(1, 2)), list(1, list(1, 3)), list(2, list(4, NULL)))
## End(Not run)
## Not run: 
##D sc <- sparkR.init()
##D rdd1 <- parallelize(sc, list(list(1, 2), list(1, 3)))
##D rdd2 <- parallelize(sc, list(list(1, 1), list(2, 4)))
##D rightOuterJoin(rdd1, rdd2, 2L)
##D # list(list(1, list(2, 1)), list(1, list(3, 1)), list(2, list(NULL, 4)))
## End(Not run)
## Not run: 
##D sc <- sparkR.init()
##D rdd1 <- parallelize(sc, list(list(1, 2), list(1, 3), list(3, 3)))
##D rdd2 <- parallelize(sc, list(list(1, 1), list(2, 4)))
##D fullOuterJoin(rdd1, rdd2, 2L) # list(list(1, list(2, 1)),
##D                               #      list(1, list(3, 1)),
##D                               #      list(2, list(NULL, 4)))
##D                               #      list(3, list(3, NULL)),
## End(Not run)

[Package SparkR version 0.1 Index]