PySpark TypeError: object of type 'ParamGridBuilder' has no len()

PySpark TypeError: object of type 'ParamGridBuilder' has no len()

我正在尝试使用 Pyspark 在 Databricks 上调整我的模型。

我收到以下错误: 类型错误:'ParamGridBuilder' 类型的对象没有 len()

下面列出了我的代码。

from pyspark.ml.recommendation import ALS
from pyspark.ml.evaluation import RegressionEvaluator



als = ALS(userCol = "userId",itemCol="movieId", ratingCol="rating",  coldStartStrategy="drop", nonnegative = True, implicitPrefs = False)

# Imports ParamGridBuilder package
from pyspark.ml.tuning import ParamGridBuilder 

# Creates a ParamGridBuilder, and adds hyperparameters
param_grid = ParamGridBuilder().addGrid(als.rank, [5,10,20,40]).addGrid(als.maxIter, [5,10,15,20]).addGrid(als.regParam,[0.01,0.001,0.0001,0.02]) 

evaluator = RegressionEvaluator(metricName="rmse", labelCol="rating",predictionCol="prediction")

# Imports CrossValidator package
from pyspark.ml.tuning import CrossValidator 

# Creates cross validator and tells Spark what to use when training and evaluates
cv = CrossValidator(estimator = als,
                    estimatorParamMaps = param_grid,
                    evaluator = evaluator,
                    numFolds = 5) 

model = cv.fit(training) 

TypeError: 'ParamGridBuilder' 类型的对象没有 len()

完整错误日志:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<command-1952169986445972> in <module>()
----> 1 model = cv.fit(training)
      2 
      3 # Extract best combination of values from cross validation
      4 
      5 best_model = model.bestModel

/databricks/spark/python/pyspark/ml/base.py in fit(self, dataset, params)
    130                 return self.copy(params)._fit(dataset)
    131             else:
--> 132                 return self._fit(dataset)
    133         else:
    134             raise ValueError("Params must be either a param map or a list/tuple of param maps, "

/databricks/spark/python/pyspark/ml/tuning.py in _fit(self, dataset)
    279         est = self.getOrDefault(self.estimator)
    280         epm = self.getOrDefault(self.estimatorParamMaps)
--> 281         numModels = len(epm)

这很简单,意味着您的对象没有长度 属性(与列表不同)。因此,在你的行中

param_grid = ParamGridBuilder()
    .addGrid(als.rank, [5,10,20,40])
    .addGrid(als.maxIter, [5,10,15,20])
    .addGrid(als.regParam, [0.01,0.001,0.0001,0.02])

您应该在最后添加 .build() 以实际构建网格。