当 运行 mahout spark-itemsimilarity 出错时?
When running mahout spark-itemsimilarity is giving error?
当我 运行 时,出现以下 Stack-Trace 错误
./mahout spark-itemsimilarity --input input-file
--output /output_dir
--master spark://url_to_master
--filter1 purchase
--filter2 view
--itemIDColumn 2
--rowIDColumn 0
--filterColumn 1
在 linux 终端。
我从 github Mahout 分支 spark-1.2 克隆了项目并做了
mvn install
在 mahout 源代码目录中。比
cd mahout/bin/
java.lang.NoClassDefFoundError: com/google/common/collect/HashBiMap
at org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator.registerClasses(MahoutKryoRegistrator.scala:39)
at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo.apply(KryoSerializer.scala:104)
at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo.apply(KryoSerializer.scala:104)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:104)
at org.apache.spark.serializer.KryoSerializerInstance.<init>(KryoSerializer.scala:159)
at org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:121)
at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:214)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock.apply(TorrentBroadcast.scala:177)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:61)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.google.common.collect.HashBiMap
at java.net.URLClassLoader.run(URLClassLoader.java:366)
at java.net.URLClassLoader.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 22 more
请帮忙!
谢谢。
Mahout 0.10.0 支持 Spark 1.1.1 或更低版本。如果您从源代码构建并更改 mahout/pom.xml 主 pom 中的 Spark 版本号,您可以为 Spark 1.2 构建,但您将不得不使用下面描述的解决方法。名称中带有 "dependency-reduced" 的 jar 将位于 mahout/spark/target 中。正在处理 Spark 1.2 分支,因此不需要上述修复。距离准备好尝试可能还有一个星期。
Spark 1.2 forward 中存在错误,不确定是否已在 1.3 中修复。
看这里:https://issues.apache.org/jira/browse/SPARK-6069
对我有用的是将装有番石榴的罐子(它将被称为 mahout-spark_2.10-0.11.0-SNAPSHOT-dependency-reduced.jar 或类似的东西)放在所有工人身上然后将该位置传递给 Mahout 作业,使用:
spark-itemsimilarity -D:spark.executor.extraClassPath=/path/to/mahout/spark/target/mahout-spark_2.10-0.11-dependency-reduced.jar
路径必须包含所有工作程序上的 jar。
周围的代码将在下周左右进入 spark-1.2 分支,这将使 -D:spark.executor.extraClassPath=/path/to/mahout...
不再需要。
当我 运行 时,出现以下 Stack-Trace 错误
./mahout spark-itemsimilarity --input input-file
--output /output_dir
--master spark://url_to_master
--filter1 purchase
--filter2 view
--itemIDColumn 2
--rowIDColumn 0
--filterColumn 1
在 linux 终端。
我从 github Mahout 分支 spark-1.2 克隆了项目并做了
mvn install
在 mahout 源代码目录中。比
cd mahout/bin/
java.lang.NoClassDefFoundError: com/google/common/collect/HashBiMap
at org.apache.mahout.sparkbindings.io.MahoutKryoRegistrator.registerClasses(MahoutKryoRegistrator.scala:39)
at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo.apply(KryoSerializer.scala:104)
at org.apache.spark.serializer.KryoSerializer$$anonfun$newKryo.apply(KryoSerializer.scala:104)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:104)
at org.apache.spark.serializer.KryoSerializerInstance.<init>(KryoSerializer.scala:159)
at org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:121)
at org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:214)
at org.apache.spark.broadcast.TorrentBroadcast$$anonfun$readBroadcastBlock.apply(TorrentBroadcast.scala:177)
at org.apache.spark.util.Utils$.tryOrIOException(Utils.scala:1090)
at org.apache.spark.broadcast.TorrentBroadcast.readBroadcastBlock(TorrentBroadcast.scala:164)
at org.apache.spark.broadcast.TorrentBroadcast._value$lzycompute(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast._value(TorrentBroadcast.scala:64)
at org.apache.spark.broadcast.TorrentBroadcast.getValue(TorrentBroadcast.scala:87)
at org.apache.spark.broadcast.Broadcast.value(Broadcast.scala:70)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:61)
at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
at org.apache.spark.scheduler.Task.run(Task.scala:56)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:200)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: com.google.common.collect.HashBiMap
at java.net.URLClassLoader.run(URLClassLoader.java:366)
at java.net.URLClassLoader.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 22 more
请帮忙! 谢谢。
Mahout 0.10.0 支持 Spark 1.1.1 或更低版本。如果您从源代码构建并更改 mahout/pom.xml 主 pom 中的 Spark 版本号,您可以为 Spark 1.2 构建,但您将不得不使用下面描述的解决方法。名称中带有 "dependency-reduced" 的 jar 将位于 mahout/spark/target 中。正在处理 Spark 1.2 分支,因此不需要上述修复。距离准备好尝试可能还有一个星期。
Spark 1.2 forward 中存在错误,不确定是否已在 1.3 中修复。
看这里:https://issues.apache.org/jira/browse/SPARK-6069
对我有用的是将装有番石榴的罐子(它将被称为 mahout-spark_2.10-0.11.0-SNAPSHOT-dependency-reduced.jar 或类似的东西)放在所有工人身上然后将该位置传递给 Mahout 作业,使用:
spark-itemsimilarity -D:spark.executor.extraClassPath=/path/to/mahout/spark/target/mahout-spark_2.10-0.11-dependency-reduced.jar
路径必须包含所有工作程序上的 jar。
周围的代码将在下周左右进入 spark-1.2 分支,这将使 -D:spark.executor.extraClassPath=/path/to/mahout...
不再需要。