join {SparkR} | R Documentation |
join
This function joins two RDDs where every element is of the form list(K, V).
The key types of the two RDDs should be the same.
leftouterjoin
This function left-outer-joins two RDDs where every element is of the form list(K, V).
The key types of the two RDDs should be the same.
rightouterjoin
This function right-outer-joins two RDDs where every element is of the form list(K, V).
The key types of the two RDDs should be the same.
fullouterjoin
This function full-outer-joins two RDDs where every element is of the form list(K, V).
The key types of the two RDDs should be the same.
join(rdd1, rdd2, numPartitions) ## S4 method for signature 'RDD,RDD,integer' join(rdd1, rdd2, numPartitions) leftOuterJoin(rdd1, rdd2, numPartitions) ## S4 method for signature 'RDD,RDD,integer' leftOuterJoin(rdd1, rdd2, numPartitions) rightOuterJoin(rdd1, rdd2, numPartitions) ## S4 method for signature 'RDD,RDD,integer' rightOuterJoin(rdd1, rdd2, numPartitions) fullOuterJoin(rdd1, rdd2, numPartitions) ## S4 method for signature 'RDD,RDD,integer' fullOuterJoin(rdd1, rdd2, numPartitions)
rdd1 |
An RDD to be joined. Should be an RDD where each element is list(K, V). |
rdd2 |
An RDD to be joined. Should be an RDD where each element is list(K, V). |
numPartitions |
Number of partitions to create. |
rdd1 |
An RDD to be joined. Should be an RDD where each element is list(K, V). |
rdd2 |
An RDD to be joined. Should be an RDD where each element is list(K, V). |
numPartitions |
Number of partitions to create. |
rdd1 |
An RDD to be joined. Should be an RDD where each element is list(K, V). |
rdd2 |
An RDD to be joined. Should be an RDD where each element is list(K, V). |
numPartitions |
Number of partitions to create. |
rdd1 |
An RDD to be joined. Should be an RDD where each element is list(K, V). |
rdd2 |
An RDD to be joined. Should be an RDD where each element is list(K, V). |
numPartitions |
Number of partitions to create. |
a new RDD containing all pairs of elements with matching keys in two input RDDs.
For each element (k, v) in rdd1, the resulting RDD will either contain all pairs (k, (v, w)) for (k, w) in rdd2, or the pair (k, (v, NULL)) if no elements in rdd2 have key k.
For each element (k, w) in rdd2, the resulting RDD will either contain all pairs (k, (v, w)) for (k, v) in rdd1, or the pair (k, (NULL, w)) if no elements in rdd1 have key k.
For each element (k, v) in rdd1 and (k, w) in rdd2, the resulting RDD will contain all pairs (k, (v, w)) for both (k, v) in rdd1 and and (k, w) in rdd2, or the pair (k, (NULL, w))/(k, (v, NULL)) if no elements in rdd1/rdd2 have key k.
## Not run:
##D sc <- sparkR.init()
##D rdd1 <- parallelize(sc, list(list(1, 1), list(2, 4)))
##D rdd2 <- parallelize(sc, list(list(1, 2), list(1, 3)))
##D join(rdd1, rdd2, 2L) # list(list(1, list(1, 2)), list(1, list(1, 3))
## End(Not run)
## Not run:
##D sc <- sparkR.init()
##D rdd1 <- parallelize(sc, list(list(1, 1), list(2, 4)))
##D rdd2 <- parallelize(sc, list(list(1, 2), list(1, 3)))
##D leftOuterJoin(rdd1, rdd2, 2L)
##D # list(list(1, list(1, 2)), list(1, list(1, 3)), list(2, list(4, NULL)))
## End(Not run)
## Not run:
##D sc <- sparkR.init()
##D rdd1 <- parallelize(sc, list(list(1, 2), list(1, 3)))
##D rdd2 <- parallelize(sc, list(list(1, 1), list(2, 4)))
##D rightOuterJoin(rdd1, rdd2, 2L)
##D # list(list(1, list(2, 1)), list(1, list(3, 1)), list(2, list(NULL, 4)))
## End(Not run)
## Not run:
##D sc <- sparkR.init()
##D rdd1 <- parallelize(sc, list(list(1, 2), list(1, 3), list(3, 3)))
##D rdd2 <- parallelize(sc, list(list(1, 1), list(2, 4)))
##D fullOuterJoin(rdd1, rdd2, 2L) # list(list(1, list(2, 1)),
##D # list(1, list(3, 1)),
##D # list(2, list(NULL, 4)))
##D # list(3, list(3, NULL)),
## End(Not run)