Pyspark 训练的 Logistic 回归模型没有 predict() 和 predictProbability() 函数

Pyspark trained Logistic Regression model doesn't predict() and predictProbability() function

我使用内置的 PySpark MLlib class LogisticRegression 训练了逻辑回归模型。但是,当它被训练时,它不能用于预测其他数据帧,因为 AttributeError: 'LogisticRegression' object has no attribute 'predictProbability' OR AttributeError: 'LogisticRegression' object has no attribute 'predict'.

from pyspark.ml.classification import LogisticRegression
model = LogisticRegression(regParam=0.5, elasticNetParam=1.0)

# define the input feature & output column
model.setFeaturesCol('features')
model.setLabelCol('WinA')

model.fit(df_train)

model.setPredictionCol('WinA')
model.predictProbability(df_val['features'])
model.predict(df_val['features'])
AttributeError: 'LogisticRegression' object has no attribute 'predictProbability'

属性:

PySpark 版本:

>>import pyspark
>>pyspark.__version__
3.1.2

JDK版本:

>>!java -version
openjdk version "11.0.11" 2021-04-20
OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.18.04)
OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.18.04, mixed mode, sharing)

环境:Google Colab

你的代码在这里

model.fit(df_train)

实际上并没有给你训练好的模型,因为变量的类型 model 仍然是 pyspark.ml.classification.LogisticRegression class

type(model)

# pyspark.ml.classification.LogisticRegression

因此,您应该通过将返回的对象分配给变量或覆盖您的 model 变量来捕获返回的对象,然后它将为您提供经过训练的逻辑回归模型 pyspark.ml.classification.LogisticRegressionModel class

model = model.fit(df_train)
type(model)

# pyspark.ml.classification.LogisticRegressionModel

最后,.predict.predictProbability 方法需要一个 pyspark.ml.linalg.DenseVector 对象的参数。因此,我认为您想改用 .transform ,因为它将预测标签和概率作为列添加到输入数据帧中。会是这样

predicted_df = model.transform(df_val)