Spark 将隐藏参数添加到 Scala class 的构造函数
Spark adds hidden parameter to constructor of a Scala class
我不知道如何解释这一点,但 Spark 似乎向构造函数添加了一个隐藏的(隐含的?)参数。这是我在 spark-shell
中尝试的代码(在常规 Scala shell 参数列表中将为空):
scala> class A {}
defined class A
scala> classOf[A].getConstructors()(0).getAnnotatedParameterTypes
res0: Array[java.lang.reflect.AnnotatedType] = Array(sun.reflect.annotation.AnnotatedTypeFactory$AnnotatedTypeBaseImpl@5ed65e4b)
由于这个参数,我无法将我的自定义 InputFormat
class 传递给 Spark 的 hadoopFile
函数。关于这里发生了什么的任何提示,或者至少我如何使用无参数构造函数创建 class?
行为似乎与普通 Scala REPL 中的行为相同
$ scala
Welcome to Scala 2.13.3 (Java HotSpot(TM) 64-Bit GraalVM EE 19.3.0, Java 1.8.0_231).
Type in expressions for evaluation. Or try :help.
scala> class A {}
class A
scala> classOf[A].getConstructors()(0).getAnnotatedParameterTypes
val res0: Array[java.lang.reflect.AnnotatedType] = Array(sun.reflect.annotation.AnnotatedTypeFactory$AnnotatedTypeBaseImpl@383864d5)
scala> classOf[A].getConstructors()(0).getParameters
val res1: Array[java.lang.reflect.Parameter] = Array(final $iw $outer)
REPL 使 class 嵌套(REPL 中的每一行都是外部 class 的实例化)。这将外部 class 的实例作为参数添加到构造函数($outer
是参数名称,$iw
是外部 class)。您可以按如下方式重现此行为
class X {
class A {}
}
object App {
def main(args: Array[String]): Unit = {
val x = new X
println(classOf[x.A].getConstructors()(0).getAnnotatedParameterTypes.mkString(","))
// sun.reflect.annotation.AnnotatedTypeFactory$AnnotatedTypeBaseImpl@2f7c7260
println(classOf[x.A].getConstructors()(0).getParameters.mkString(","))
// final X $outer
}
}
如果你 运行 REPL 编译器选项 -Xprint:typer
打开(如 scala -Xprint:typer
或 spark-shell -Xprint:typer
)你会看到
$ scala -Xprint:typer
Welcome to Scala 2.13.3 (Java HotSpot(TM) 64-Bit GraalVM EE 19.3.0, Java 1.8.0_231).
Type in expressions for evaluation. Or try :help.
scala> class A
[[syntax trees at end of typer]] // <console>
package $line3 {
sealed class $read extends AnyRef with Serializable {
def <init>(): $line3.$read = {
$read.super.<init>();
()
};
sealed class $iw extends AnyRef with java.io.Serializable {
def <init>(): $iw = {
$iw.super.<init>();
()
};
class A extends scala.AnyRef {
def <init>(): A = {
A.super.<init>();
()
}
}
};
private[this] val $iw: $iw = new $read.this.$iw();
<stable> <accessor> def $iw: $iw = $read.this.$iw
};
object $read extends scala.AnyRef with java.io.Serializable {
def <init>(): type = {
$read.super.<init>();
()
};
private[this] val INSTANCE: $line3.$read = new $read();
<stable> <accessor> def INSTANCE: $line3.$read = $read.this.INSTANCE;
<synthetic> private def writeReplace(): Object = new scala.runtime.ModuleSerializationProxy(classOf[$line3.$read$])
}
}
class A
所以这个额外的构造函数参数$outer
可以获得为$line3.$read.INSTANCE.$iw
scala> classOf[A].getConstructors()(0).newInstance($line3.$read.INSTANCE.$iw)
...
val res0: Object = A@282ffbf5
请注意,不同版本的 Scala 中的编码可能会发生变化。例如,来自 Spark 3.0.1(为 Hadoop 3.2 预构建)的 spark-shell
使用 Scala 2.12.10 并且 $lineXXX.$read.INSTANCE.$iw.$iw
应该代替 $lineXXX.$read.INSTANCE.$iw
$ spark-shell -Xprint:typer
20/11/25 16:32:16 WARN Utils: Your hostname, dmitin-HP-Pavilion-Laptop resolves to a loopback address: 127.0.1.1; using 192.168.0.103 instead (on interface wlo1)
20/11/25 16:32:16 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
20/11/25 16:32:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://192.168.0.103:4040
Spark context available as 'sc' (master = local[*], app id = local-1606314741512).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.0.1
/_/
Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit GraalVM EE 19.3.0, Java 1.8.0_231)
Type in expressions to have them evaluated.
Type :help for more information.
scala> class A
[[syntax trees at end of typer]] // <console>
package $line14 {
sealed class $read extends AnyRef with java.io.Serializable {
def <init>(): $line14.$read = {
$read.super.<init>();
()
};
sealed class $iw extends AnyRef with java.io.Serializable {
def <init>(): $read.this.$iw = {
$iw.super.<init>();
()
};
sealed class $iw extends AnyRef with java.io.Serializable {
def <init>(): $iw = {
$iw.super.<init>();
()
};
class A extends scala.AnyRef {
def <init>(): A = {
A.super.<init>();
()
}
}
};
private[this] val $iw: $iw = new $iw.this.$iw();
<stable> <accessor> def $iw: $iw = $iw.this.$iw
};
private[this] val $iw: $read.this.$iw = new $read.this.$iw();
<stable> <accessor> def $iw: $read.this.$iw = $read.this.$iw
};
object $read extends scala.AnyRef with Serializable {
def <init>(): $line14.$read.type = {
$read.super.<init>();
()
};
private[this] val INSTANCE: $line14.$read = new $read();
<stable> <accessor> def INSTANCE: $line14.$read = $read.this.INSTANCE;
<synthetic> private def readResolve(): Object = $line14.$read
}
}
defined class A
scala> classOf[A].getConstructors()(0).newInstance($line14.$read.INSTANCE.$iw.$iw)
...
res0: Any = A@6621ab0c
在 Scala 2.12.6 中 scala -Xprint:typer
产生
$ ./scala -Xprint:typer
Welcome to Scala 2.12.6 (Java HotSpot(TM) 64-Bit GraalVM EE 19.3.0, Java 1.8.0_231).
Type in expressions for evaluation. Or try :help.
scala> class A
[[syntax trees at end of typer]] // <console>
package $line3 {
object $read extends scala.AnyRef {
def <init>(): $line3.$read.type = {
$read.super.<init>();
()
};
object $iw extends scala.AnyRef {
def <init>(): type = {
$iw.super.<init>();
()
};
object $iw extends scala.AnyRef {
def <init>(): type = {
$iw.super.<init>();
()
};
class A extends scala.AnyRef {
def <init>(): A = {
A.super.<init>();
()
}
}
}
}
}
}
defined class A
所以现在 class A
嵌套在对象 ($line3.$read.$iw.$iw
) 而不是 class 中,在这种情况下,不会将附加参数添加到构造函数中A
object X {
class A {}
}
object App {
def main(args: Array[String]): Unit = {
val x = X
println(classOf[x.A].getConstructors()(0).getAnnotatedParameterTypes.toList)
// List()
println(classOf[x.A].getConstructors()(0).getParameters.toList)
// List()
}
}
我不知道如何解释这一点,但 Spark 似乎向构造函数添加了一个隐藏的(隐含的?)参数。这是我在 spark-shell
中尝试的代码(在常规 Scala shell 参数列表中将为空):
scala> class A {}
defined class A
scala> classOf[A].getConstructors()(0).getAnnotatedParameterTypes
res0: Array[java.lang.reflect.AnnotatedType] = Array(sun.reflect.annotation.AnnotatedTypeFactory$AnnotatedTypeBaseImpl@5ed65e4b)
由于这个参数,我无法将我的自定义 InputFormat
class 传递给 Spark 的 hadoopFile
函数。关于这里发生了什么的任何提示,或者至少我如何使用无参数构造函数创建 class?
行为似乎与普通 Scala REPL 中的行为相同
$ scala
Welcome to Scala 2.13.3 (Java HotSpot(TM) 64-Bit GraalVM EE 19.3.0, Java 1.8.0_231).
Type in expressions for evaluation. Or try :help.
scala> class A {}
class A
scala> classOf[A].getConstructors()(0).getAnnotatedParameterTypes
val res0: Array[java.lang.reflect.AnnotatedType] = Array(sun.reflect.annotation.AnnotatedTypeFactory$AnnotatedTypeBaseImpl@383864d5)
scala> classOf[A].getConstructors()(0).getParameters
val res1: Array[java.lang.reflect.Parameter] = Array(final $iw $outer)
REPL 使 class 嵌套(REPL 中的每一行都是外部 class 的实例化)。这将外部 class 的实例作为参数添加到构造函数($outer
是参数名称,$iw
是外部 class)。您可以按如下方式重现此行为
class X {
class A {}
}
object App {
def main(args: Array[String]): Unit = {
val x = new X
println(classOf[x.A].getConstructors()(0).getAnnotatedParameterTypes.mkString(","))
// sun.reflect.annotation.AnnotatedTypeFactory$AnnotatedTypeBaseImpl@2f7c7260
println(classOf[x.A].getConstructors()(0).getParameters.mkString(","))
// final X $outer
}
}
如果你 运行 REPL 编译器选项 -Xprint:typer
打开(如 scala -Xprint:typer
或 spark-shell -Xprint:typer
)你会看到
$ scala -Xprint:typer
Welcome to Scala 2.13.3 (Java HotSpot(TM) 64-Bit GraalVM EE 19.3.0, Java 1.8.0_231).
Type in expressions for evaluation. Or try :help.
scala> class A
[[syntax trees at end of typer]] // <console>
package $line3 {
sealed class $read extends AnyRef with Serializable {
def <init>(): $line3.$read = {
$read.super.<init>();
()
};
sealed class $iw extends AnyRef with java.io.Serializable {
def <init>(): $iw = {
$iw.super.<init>();
()
};
class A extends scala.AnyRef {
def <init>(): A = {
A.super.<init>();
()
}
}
};
private[this] val $iw: $iw = new $read.this.$iw();
<stable> <accessor> def $iw: $iw = $read.this.$iw
};
object $read extends scala.AnyRef with java.io.Serializable {
def <init>(): type = {
$read.super.<init>();
()
};
private[this] val INSTANCE: $line3.$read = new $read();
<stable> <accessor> def INSTANCE: $line3.$read = $read.this.INSTANCE;
<synthetic> private def writeReplace(): Object = new scala.runtime.ModuleSerializationProxy(classOf[$line3.$read$])
}
}
class A
所以这个额外的构造函数参数$outer
可以获得为$line3.$read.INSTANCE.$iw
scala> classOf[A].getConstructors()(0).newInstance($line3.$read.INSTANCE.$iw)
...
val res0: Object = A@282ffbf5
请注意,不同版本的 Scala 中的编码可能会发生变化。例如,来自 Spark 3.0.1(为 Hadoop 3.2 预构建)的 spark-shell
使用 Scala 2.12.10 并且 $lineXXX.$read.INSTANCE.$iw.$iw
应该代替 $lineXXX.$read.INSTANCE.$iw
$ spark-shell -Xprint:typer
20/11/25 16:32:16 WARN Utils: Your hostname, dmitin-HP-Pavilion-Laptop resolves to a loopback address: 127.0.1.1; using 192.168.0.103 instead (on interface wlo1)
20/11/25 16:32:16 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
20/11/25 16:32:16 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://192.168.0.103:4040
Spark context available as 'sc' (master = local[*], app id = local-1606314741512).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.0.1
/_/
Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit GraalVM EE 19.3.0, Java 1.8.0_231)
Type in expressions to have them evaluated.
Type :help for more information.
scala> class A
[[syntax trees at end of typer]] // <console>
package $line14 {
sealed class $read extends AnyRef with java.io.Serializable {
def <init>(): $line14.$read = {
$read.super.<init>();
()
};
sealed class $iw extends AnyRef with java.io.Serializable {
def <init>(): $read.this.$iw = {
$iw.super.<init>();
()
};
sealed class $iw extends AnyRef with java.io.Serializable {
def <init>(): $iw = {
$iw.super.<init>();
()
};
class A extends scala.AnyRef {
def <init>(): A = {
A.super.<init>();
()
}
}
};
private[this] val $iw: $iw = new $iw.this.$iw();
<stable> <accessor> def $iw: $iw = $iw.this.$iw
};
private[this] val $iw: $read.this.$iw = new $read.this.$iw();
<stable> <accessor> def $iw: $read.this.$iw = $read.this.$iw
};
object $read extends scala.AnyRef with Serializable {
def <init>(): $line14.$read.type = {
$read.super.<init>();
()
};
private[this] val INSTANCE: $line14.$read = new $read();
<stable> <accessor> def INSTANCE: $line14.$read = $read.this.INSTANCE;
<synthetic> private def readResolve(): Object = $line14.$read
}
}
defined class A
scala> classOf[A].getConstructors()(0).newInstance($line14.$read.INSTANCE.$iw.$iw)
...
res0: Any = A@6621ab0c
在 Scala 2.12.6 中 scala -Xprint:typer
产生
$ ./scala -Xprint:typer
Welcome to Scala 2.12.6 (Java HotSpot(TM) 64-Bit GraalVM EE 19.3.0, Java 1.8.0_231).
Type in expressions for evaluation. Or try :help.
scala> class A
[[syntax trees at end of typer]] // <console>
package $line3 {
object $read extends scala.AnyRef {
def <init>(): $line3.$read.type = {
$read.super.<init>();
()
};
object $iw extends scala.AnyRef {
def <init>(): type = {
$iw.super.<init>();
()
};
object $iw extends scala.AnyRef {
def <init>(): type = {
$iw.super.<init>();
()
};
class A extends scala.AnyRef {
def <init>(): A = {
A.super.<init>();
()
}
}
}
}
}
}
defined class A
所以现在 class A
嵌套在对象 ($line3.$read.$iw.$iw
) 而不是 class 中,在这种情况下,不会将附加参数添加到构造函数中A
object X {
class A {}
}
object App {
def main(args: Array[String]): Unit = {
val x = X
println(classOf[x.A].getConstructors()(0).getAnnotatedParameterTypes.toList)
// List()
println(classOf[x.A].getConstructors()(0).getParameters.toList)
// List()
}
}