Apache Spark,文档中的 ALS Recomendation 示例有一个额外的列,我不知道它的用途
Apache Spark, ALS Recomendation example in documentation has a extra column I dont know its use
在 ALS 示例中,我有以下代码:
(http://spark.apache.org/docs/latest/ml-collaborative-filtering.html)
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS
from pyspark.sql import Row
lines = spark.read.text("data/mllib/als/sample_movielens_ratings.txt").rdd
parts = lines.map(lambda row: row.value.split("::"))
ratingsRDD = parts.map(lambda p: Row(userId=int(p[0]), movieId=int(p[1]),
rating=float(p[2]), timestamp=long(p[3])))
ratings = spark.createDataFrame(ratingsRDD)
(training, test) = ratings.randomSplit([0.8, 0.2])
# Build the recommendation model using ALS on the training data
als = ALS(maxIter=5, regParam=0.01, userCol="userId", itemCol="movieId", ratingCol="rating")
model = als.fit(training)
# Evaluate the model by computing the RMSE on the test data
predictions = model.transform(test)
evaluator = RegressionEvaluator(metricName="rmse", labelCol="rating", predictionCol="prediction")
rmse = evaluator.evaluate(predictions)
print("Root-mean-square error = " + str(rmse))
如果您看到它创建了一个具有属性时间戳的行,但随后在 ALS 创建过程中它没有使用它。
Row中属性timestamp的作用是什么?
None。它只是 MovieLens 数据附带的字段之一。对于肌萎缩侧索硬化没有用,可以忽略。
在 ALS 示例中,我有以下代码:
(http://spark.apache.org/docs/latest/ml-collaborative-filtering.html)
from pyspark.ml.evaluation import RegressionEvaluator
from pyspark.ml.recommendation import ALS
from pyspark.sql import Row
lines = spark.read.text("data/mllib/als/sample_movielens_ratings.txt").rdd
parts = lines.map(lambda row: row.value.split("::"))
ratingsRDD = parts.map(lambda p: Row(userId=int(p[0]), movieId=int(p[1]),
rating=float(p[2]), timestamp=long(p[3])))
ratings = spark.createDataFrame(ratingsRDD)
(training, test) = ratings.randomSplit([0.8, 0.2])
# Build the recommendation model using ALS on the training data
als = ALS(maxIter=5, regParam=0.01, userCol="userId", itemCol="movieId", ratingCol="rating")
model = als.fit(training)
# Evaluate the model by computing the RMSE on the test data
predictions = model.transform(test)
evaluator = RegressionEvaluator(metricName="rmse", labelCol="rating", predictionCol="prediction")
rmse = evaluator.evaluate(predictions)
print("Root-mean-square error = " + str(rmse))
如果您看到它创建了一个具有属性时间戳的行,但随后在 ALS 创建过程中它没有使用它。
Row中属性timestamp的作用是什么?
None。它只是 MovieLens 数据附带的字段之一。对于肌萎缩侧索硬化没有用,可以忽略。