是否存在一个过滤函数,当它找到对应于谓词的第 n 个第一个元素时停止
Does a filter function exist which stops when it finds the n'th first element corresponding to a predicate
我问这个问题是因为我必须在 RDD[key:Int,Array(Double)] 上找到一个特定的元素,其中键是唯一的。所以在整个 RDD 上使用过滤器会很昂贵,而我只需要一个知道密钥的元素。
val wantedkey = 94
val res = rdd.filter( x => x._1 == wantedkey )
谢谢你的建议
全部transformations are lazy and they are computed only when you call action。所以你可以写:
val wantedkey = 94
val res = rdd.filter( x => x._1 == wantedkey ).first()
查看 lookup 函数 PairRDDFunctions.scala。
def lookup(key: K): Seq[V]
Return the list of values in the RDD for key key. This operation is
done efficiently if the RDD has a known partitioner by only searching
the partition that the key maps to.
例子
val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "panther", "eagle"), 2)
val b = a.keyBy(x => (_.length)
b.lookup(5)
res0: Seq[String] = WrappedArray(tiger, eagle)
我问这个问题是因为我必须在 RDD[key:Int,Array(Double)] 上找到一个特定的元素,其中键是唯一的。所以在整个 RDD 上使用过滤器会很昂贵,而我只需要一个知道密钥的元素。
val wantedkey = 94
val res = rdd.filter( x => x._1 == wantedkey )
谢谢你的建议
全部transformations are lazy and they are computed only when you call action。所以你可以写:
val wantedkey = 94
val res = rdd.filter( x => x._1 == wantedkey ).first()
查看 lookup 函数 PairRDDFunctions.scala。
def lookup(key: K): Seq[V]
Return the list of values in the RDD for key key. This operation is
done efficiently if the RDD has a known partitioner by only searching
the partition that the key maps to.
例子
val a = sc.parallelize(List("dog", "tiger", "lion", "cat", "panther", "eagle"), 2)
val b = a.keyBy(x => (_.length)
b.lookup(5)
res0: Seq[String] = WrappedArray(tiger, eagle)