Pyspark 训练的 Logistic 回归模型没有 predict() 和 predictProbability() 函数
Pyspark trained Logistic Regression model doesn't predict() and predictProbability() function
我使用内置的 PySpark MLlib class LogisticRegression
训练了逻辑回归模型。但是,当它被训练时,它不能用于预测其他数据帧,因为 AttributeError: 'LogisticRegression' object has no attribute 'predictProbability'
OR AttributeError: 'LogisticRegression' object has no attribute 'predict'
.
from pyspark.ml.classification import LogisticRegression
model = LogisticRegression(regParam=0.5, elasticNetParam=1.0)
# define the input feature & output column
model.setFeaturesCol('features')
model.setLabelCol('WinA')
model.fit(df_train)
model.setPredictionCol('WinA')
model.predictProbability(df_val['features'])
model.predict(df_val['features'])
AttributeError: 'LogisticRegression' object has no attribute 'predictProbability'
属性:
PySpark 版本:
>>import pyspark
>>pyspark.__version__
3.1.2
JDK版本:
>>!java -version
openjdk version "11.0.11" 2021-04-20
OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.18.04)
OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.18.04, mixed mode, sharing)
环境:Google Colab
你的代码在这里
model.fit(df_train)
实际上并没有给你训练好的模型,因为变量的类型 model
仍然是 pyspark.ml.classification.LogisticRegression
class
type(model)
# pyspark.ml.classification.LogisticRegression
因此,您应该通过将返回的对象分配给变量或覆盖您的 model
变量来捕获返回的对象,然后它将为您提供经过训练的逻辑回归模型 pyspark.ml.classification.LogisticRegressionModel
class
model = model.fit(df_train)
type(model)
# pyspark.ml.classification.LogisticRegressionModel
最后,.predict
和 .predictProbability
方法需要一个 pyspark.ml.linalg.DenseVector
对象的参数。因此,我认为您想改用 .transform
,因为它将预测标签和概率作为列添加到输入数据帧中。会是这样
predicted_df = model.transform(df_val)
我使用内置的 PySpark MLlib class LogisticRegression
训练了逻辑回归模型。但是,当它被训练时,它不能用于预测其他数据帧,因为 AttributeError: 'LogisticRegression' object has no attribute 'predictProbability'
OR AttributeError: 'LogisticRegression' object has no attribute 'predict'
.
from pyspark.ml.classification import LogisticRegression
model = LogisticRegression(regParam=0.5, elasticNetParam=1.0)
# define the input feature & output column
model.setFeaturesCol('features')
model.setLabelCol('WinA')
model.fit(df_train)
model.setPredictionCol('WinA')
model.predictProbability(df_val['features'])
model.predict(df_val['features'])
AttributeError: 'LogisticRegression' object has no attribute 'predictProbability'
属性:
PySpark 版本:
>>import pyspark
>>pyspark.__version__
3.1.2
JDK版本:
>>!java -version
openjdk version "11.0.11" 2021-04-20
OpenJDK Runtime Environment (build 11.0.11+9-Ubuntu-0ubuntu2.18.04)
OpenJDK 64-Bit Server VM (build 11.0.11+9-Ubuntu-0ubuntu2.18.04, mixed mode, sharing)
环境:Google Colab
你的代码在这里
model.fit(df_train)
实际上并没有给你训练好的模型,因为变量的类型 model
仍然是 pyspark.ml.classification.LogisticRegression
class
type(model)
# pyspark.ml.classification.LogisticRegression
因此,您应该通过将返回的对象分配给变量或覆盖您的 model
变量来捕获返回的对象,然后它将为您提供经过训练的逻辑回归模型 pyspark.ml.classification.LogisticRegressionModel
class
model = model.fit(df_train)
type(model)
# pyspark.ml.classification.LogisticRegressionModel
最后,.predict
和 .predictProbability
方法需要一个 pyspark.ml.linalg.DenseVector
对象的参数。因此,我认为您想改用 .transform
,因为它将预测标签和概率作为列添加到输入数据帧中。会是这样
predicted_df = model.transform(df_val)