zipWithUniqueId {SparkR} | R Documentation |
Items in the kth partition will get ids k, n+k, 2*n+k, ..., where n is the number of partitions. So there may exist gaps, but this method won't trigger a spark job, which is different from zipWithIndex.
zipWithUniqueId(x) ## S4 method for signature 'RDD' zipWithUniqueId(x)
x |
An RDD to be zipped. |
An RDD with zipped items.
zipWithIndex
## Not run:
##D sc <- sparkR.init()
##D rdd <- parallelize(sc, list("a", "b", "c", "d", "e"), 3L)
##D collect(zipWithUniqueId(rdd))
##D # list(list("a", 0), list("b", 3), list("c", 1), list("d", 4), list("e", 2))
## End(Not run)