在 Spark Scala 中引用 java 嵌套 class

Question

我正在尝试使用交互式 Scala shell 将一些数据从 hadoop 读取到 Spark 中的 RDD，但是我无法访问一些 classes 我需要反序列化数据。

我首先导入必要的 class

import com.example.ClassA

效果很好。 ClassA 位于 'jars' 路径中的一个 jar 中，并且有 ClassB 作为 public 静态嵌套 class

然后我尝试像这样使用 ClassB：

val rawData = sc.newAPIHadoopFile(dataPath, classOf[com.exmple.mapreduce.input.Format[com.example.ClassA$ClassB]], classOf[org.apache.hadoop.io.LongWritable], classOf[com.example.ClassA$ClassB])

其他 class 中的一个将 ClassB 作为类型，这有点复杂，但我认为应该没问题。

当我执行这一行时，出现以下错误：

<console>:17: error: type ClassA$ClassB is not a member of package com.example

我也试过使用导入语句

import com.example.ClassA$ClassB

而且看起来也不错。

任何关于我如何继续调试的建议都将不胜感激

感谢阅读。

更新：

正在将“$”更改为“.”引用嵌套的 class 似乎可以解决这个问题，尽管我随后遇到以下语法错误：

'<console>:17: error: inferred type arguments [org.apache.hadoop.io.LongWritable,com.example.ClassA.ClassB,com.example.mapredu‌ce.input.Format[com.example.ClassA.ClassB]] do not conform to method newAPIHadoopFile's type parameter bounds [K,V,F <: org.apache.hadoop.mapreduce.InputFormat[K,V]]

Answer 1

注意 newAPIHadoopFile 期望的类型：

K,V,F <: org.apache.hadoop.mapreduce.InputFormat[K,V]

这里的重要部分是通用类型 InputFormat 期望类型 K 和 V，即方法的前两个参数的确切类型。

在你的例子中，第三个参数的类型应该是

F <: org.apache.hadoop.mapreduce.InputFormat[LongWritable, ClassA.ClassB]

你的 class 是否延长 FileInputFormat<LongWritable, V>？

在 Spark Scala 中引用 java 嵌套 class

Reference a java nested class in Spark Scala

scala

apache-spark