苏打水:将火花数据帧转换为 H2o 数据帧时内存不足
Sparkling Water: out of memory when converting spark dataframe to H2o dataframe
我正在尝试将 Spark DataFrame 转换为 H2O DataFrame
对于 spark 设置,我使用的是
.setMaster("local[1]")
.set("spark.driver.memory", "4g")
.set("spark.executor.memory", "4g")
我尝试了 H2O 2.0.2 和 H2O 1.6.4。我在以下位置遇到了同样的错误:
val trainsetH2O: H2OFrame = trainsetH
val testsetH2O: H2OFrame = testsetH
错误信息是:
ERROR Executor: Exception in task 49.0 in stage 3.0 (TID 62)
java.lang.OutOfMemoryError: PermGen space
at sun.misc.Unsafe.defineClass(Native Method)
at sun.reflect.ClassDefiner.defineClass(ClassDefiner.java:63)
at sun.reflect.MethodAccessorGenerator.run(MethodAccessorGenerator.java:399)
at sun.reflect.MethodAccessorGenerator.run(MethodAccessorGenerator.java:396)
at java.security.AccessController.doPrivileged(Native Method)
at sun.reflect.MethodAccessorGenerator.generate(MethodAccessorGenerator.java:395)
at sun.reflect.MethodAccessorGenerator.generateSerializationConstructor(MethodAccessorGenerator.java:113)
at sun.reflect.ReflectionFactory.newConstructorForSerialization(ReflectionFactory.java:331)
at java.io.ObjectStreamClass.getSerializableConstructor(ObjectStreamClass.java:1376)
at java.io.ObjectStreamClass.access00(ObjectStreamClass.java:72)
at java.io.ObjectStreamClass.run(ObjectStreamClass.java:493)
at java.io.ObjectStreamClass.run(ObjectStreamClass.java:468)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:468)
at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
哪里错了? trainset和testset的数据都不到10K,所以其实很小。
问题是您 运行 PermGem 内存不足,这与您通常使用
为驱动程序和执行程序配置的内存 space 不同
.set("spark.driver.memory", "4g")
.set("spark.executor.memory", "4g")
这是 JVM 内存的一部分,其中包含加载的 类。要为 spark 驱动程序和执行程序增加它,请使用以下参数调用 spark-submit
或 spark-shell
命令。
--conf spark.driver.extraJavaOptions="-XX:MaxPermSize=384m" --conf spark.executor.extraJavaOptions="-XX:MaxPermSize=384m"
我正在尝试将 Spark DataFrame 转换为 H2O DataFrame
对于 spark 设置,我使用的是
.setMaster("local[1]")
.set("spark.driver.memory", "4g")
.set("spark.executor.memory", "4g")
我尝试了 H2O 2.0.2 和 H2O 1.6.4。我在以下位置遇到了同样的错误:
val trainsetH2O: H2OFrame = trainsetH
val testsetH2O: H2OFrame = testsetH
错误信息是:
ERROR Executor: Exception in task 49.0 in stage 3.0 (TID 62)
java.lang.OutOfMemoryError: PermGen space
at sun.misc.Unsafe.defineClass(Native Method)
at sun.reflect.ClassDefiner.defineClass(ClassDefiner.java:63)
at sun.reflect.MethodAccessorGenerator.run(MethodAccessorGenerator.java:399)
at sun.reflect.MethodAccessorGenerator.run(MethodAccessorGenerator.java:396)
at java.security.AccessController.doPrivileged(Native Method)
at sun.reflect.MethodAccessorGenerator.generate(MethodAccessorGenerator.java:395)
at sun.reflect.MethodAccessorGenerator.generateSerializationConstructor(MethodAccessorGenerator.java:113)
at sun.reflect.ReflectionFactory.newConstructorForSerialization(ReflectionFactory.java:331)
at java.io.ObjectStreamClass.getSerializableConstructor(ObjectStreamClass.java:1376)
at java.io.ObjectStreamClass.access00(ObjectStreamClass.java:72)
at java.io.ObjectStreamClass.run(ObjectStreamClass.java:493)
at java.io.ObjectStreamClass.run(ObjectStreamClass.java:468)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:468)
at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:365)
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:602)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1622)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
哪里错了? trainset和testset的数据都不到10K,所以其实很小。
问题是您 运行 PermGem 内存不足,这与您通常使用
为驱动程序和执行程序配置的内存 space 不同.set("spark.driver.memory", "4g")
.set("spark.executor.memory", "4g")
这是 JVM 内存的一部分,其中包含加载的 类。要为 spark 驱动程序和执行程序增加它,请使用以下参数调用 spark-submit
或 spark-shell
命令。
--conf spark.driver.extraJavaOptions="-XX:MaxPermSize=384m" --conf spark.executor.extraJavaOptions="-XX:MaxPermSize=384m"