使用苏打水将 Spark MLLib 算法集成到 H2O ai
Integrating Spark MLLib algorithm to H2O ai using Sparkling water
我正在尝试将 Spark MLLib 中的协作算法与 H2o Ai 集成,使用苏打水进行产品推荐。我关注了这个 link
http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html
并更新代码如下
System.setProperty("hadoop.home.dir", "D:\backup\lib\winutils")
val conf = new SparkConf()
.setAppName("Spark-InputFile processor")
.setMaster("local")
val sc = new SparkContext(conf)
val inputFile = "src/main/resources/test.data"
val data = sc.textFile(inputFile)
val ratings = data.map(x=>{
val mapper = x.split(",")
Rating(mapper(0).toInt,mapper(1).toInt,mapper(2).toDouble)
})
// Build the recommendation model using ALS
val rank = 10
val numIterations = 10
val model = ALS.train(ratings, rank, numIterations, 0.01)
// Save and load model
model.save(sc, "target/tmp/myCollaborativeFilter")
val sameModel = MatrixFactorizationModel.load(sc, "target/tmp/myCollaborativeFilter")
val modelRdd = sameModel.recommendProductsForUsers(100)
implicit val sqlContext = SparkSession.builder().getOrCreate().sqlContext
import sqlContext.implicits._
val modelDf = modelRdd.toDF("Rdd","Rdd1")
@transient val hc = H2OContext.getOrCreate(sc)
val h2oframe:H2OFrame = hc.asH2OFrame(modelDf)
当我 运行 Intellij 中的代码时,出现以下错误
Exception in thread "main" java.util.NoSuchElementException: key not found: StructType(StructField(user,IntegerType,false), StructField(product,IntegerType,false), StructField(rating,DoubleType,false))
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:59)
at scala.collection.MapLike$class.apply(MapLike.scala:141)
at scala.collection.AbstractMap.apply(Map.scala:59)
at org.apache.spark.h2o.utils.ReflectionUtils$.vecTypeFor(ReflectionUtils.scala:132)
at org.apache.spark.h2o.converters.SparkDataFrameConverter$$anonfun.apply(SparkDataFrameConverter.scala:68)
at org.apache.spark.h2o.converters.SparkDataFrameConverter$$anonfun.apply(SparkDataFrameConverter.scala:68)
at scala.collection.TraversableLike$$anonfun$map.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.h2o.converters.SparkDataFrameConverter$.toH2OFrame(SparkDataFrameConverter.scala:68)
at org.apache.spark.h2o.H2OContext.asH2OFrame(H2OContext.scala:132)
at org.apache.spark.h2o.H2OContext.asH2OFrame(H2OContext.scala:130)
at com.poc.sample.RecommendataionAlgo$.main(RecommendataionAlgo.scala:54)
at com.poc.sample.RecommendataionAlgo.main(RecommendataionAlgo.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
如何解决这个错误?
提前致谢。
modelRdd
将是 Tuple2<Object, Rating>
类型(或 Scala 中的等效类型),Rating
不是我们(Sparkling Water)提供自动转换的类型(它不是 String, Double, Float etc.
也没有实现 Product
)。我们肯定需要在那里抛出更有意义的错误消息。
要解决此问题,而不是使用 Object, Rating
和 modelRdd.toDF("Rdd","Rdd1")
制作 DataFrame,您可以将其映射到具有 4 列 Object, user, product, rating
的 DF,然后使用 hc.asH2OFrame()
。
我正在尝试将 Spark MLLib 中的协作算法与 H2o Ai 集成,使用苏打水进行产品推荐。我关注了这个 link
http://spark.apache.org/docs/latest/mllib-collaborative-filtering.html
并更新代码如下
System.setProperty("hadoop.home.dir", "D:\backup\lib\winutils")
val conf = new SparkConf()
.setAppName("Spark-InputFile processor")
.setMaster("local")
val sc = new SparkContext(conf)
val inputFile = "src/main/resources/test.data"
val data = sc.textFile(inputFile)
val ratings = data.map(x=>{
val mapper = x.split(",")
Rating(mapper(0).toInt,mapper(1).toInt,mapper(2).toDouble)
})
// Build the recommendation model using ALS
val rank = 10
val numIterations = 10
val model = ALS.train(ratings, rank, numIterations, 0.01)
// Save and load model
model.save(sc, "target/tmp/myCollaborativeFilter")
val sameModel = MatrixFactorizationModel.load(sc, "target/tmp/myCollaborativeFilter")
val modelRdd = sameModel.recommendProductsForUsers(100)
implicit val sqlContext = SparkSession.builder().getOrCreate().sqlContext
import sqlContext.implicits._
val modelDf = modelRdd.toDF("Rdd","Rdd1")
@transient val hc = H2OContext.getOrCreate(sc)
val h2oframe:H2OFrame = hc.asH2OFrame(modelDf)
当我 运行 Intellij 中的代码时,出现以下错误
Exception in thread "main" java.util.NoSuchElementException: key not found: StructType(StructField(user,IntegerType,false), StructField(product,IntegerType,false), StructField(rating,DoubleType,false))
at scala.collection.MapLike$class.default(MapLike.scala:228)
at scala.collection.AbstractMap.default(Map.scala:59)
at scala.collection.MapLike$class.apply(MapLike.scala:141)
at scala.collection.AbstractMap.apply(Map.scala:59)
at org.apache.spark.h2o.utils.ReflectionUtils$.vecTypeFor(ReflectionUtils.scala:132)
at org.apache.spark.h2o.converters.SparkDataFrameConverter$$anonfun.apply(SparkDataFrameConverter.scala:68)
at org.apache.spark.h2o.converters.SparkDataFrameConverter$$anonfun.apply(SparkDataFrameConverter.scala:68)
at scala.collection.TraversableLike$$anonfun$map.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map.apply(TraversableLike.scala:234)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at org.apache.spark.h2o.converters.SparkDataFrameConverter$.toH2OFrame(SparkDataFrameConverter.scala:68)
at org.apache.spark.h2o.H2OContext.asH2OFrame(H2OContext.scala:132)
at org.apache.spark.h2o.H2OContext.asH2OFrame(H2OContext.scala:130)
at com.poc.sample.RecommendataionAlgo$.main(RecommendataionAlgo.scala:54)
at com.poc.sample.RecommendataionAlgo.main(RecommendataionAlgo.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
如何解决这个错误?
提前致谢。
modelRdd
将是 Tuple2<Object, Rating>
类型(或 Scala 中的等效类型),Rating
不是我们(Sparkling Water)提供自动转换的类型(它不是 String, Double, Float etc.
也没有实现 Product
)。我们肯定需要在那里抛出更有意义的错误消息。
要解决此问题,而不是使用 Object, Rating
和 modelRdd.toDF("Rdd","Rdd1")
制作 DataFrame,您可以将其映射到具有 4 列 Object, user, product, rating
的 DF,然后使用 hc.asH2OFrame()
。