"Value toSeq is not a member of Product with Serializable with scala.util.Either"?

"Value toSeq is not a member of Product with Serializable with scala.util.Either"?

我有一个 RDD 文本文件需要解析。我通过在它们上映射一个函数来实现这一点,其中 returns Either[String, Book] 其中 Book 是解析产生的结构化类型,或者 String 是无法解析的文本。结果是 RDD[Either[String, Book]]。我想有一个RDD[String]和一个RDD[Book],因为前者应该被记录和丢弃,而后者应该被更多地处理。

我的分离器是:

implicit class EitherRDDOps[L, R](rdd: RDD[Either[L, R]]) {
    def split(): (RDD[L], RDD[R]) = {
        // toSeq on Either provides empty Seq for Right and one-element Seq for Left
        val left: RDD[L] = rdd.flatMap(_.swap.toSeq)
        val right: RDD[R] = rdd.flatMap(_.toSeq)
        (left, right)
    }
}

分离器称为 input.map(parseBook).cache.split,其中 input 是一个 RDD[String]parseBook 是一个 (String) => Either[String, Book]

我得到以下编译错误:

value toSeq is not a member of Product with Serializable with scala.util.Either
       val left: RDD[L] = rdd.flatMap(_.swap.toSeq)
                                     ^

value toSeq is not a member of Either[L,R]
       val right: RDD[R] = rdd.flatMap(_.toSeq)
                                 ^

type mismatch;
  found   : org.apache.spark.rdd.RDD[Nothing]
  required: org.apache.spark.rdd.RDD[L]
 Note: Nothing <: L, but class RDD is invariant in type T.
 You may wish to define T as +T instead. (SLS 4.5)
       (left, right)
        ^

  found   : org.apache.spark.rdd.RDD[Nothing]
  required: org.apache.spark.rdd.RDD[R]
 Note: Nothing <: R, but class RDD is invariant in type T.
 You may wish to define T as +T instead. (SLS 4.5)
       (left, right)
              ^

但是the documentation清楚地在Either上列出了一个toSeq方法。任何的想法?我应该采取不同的方式吗?

似乎您使用的是稍旧版本的 Scala,可能是 2.11.x 或类似版本。 Either 最近更新了,旧版本可能没有 toSeq: link to 2.11.8 documentation.

试试这个:

val left = rdd.filter(_.isRight).map(_.right.get)
val right = rdd.filter(_.isLeft).map(_.left.get)