AttributeError: 'str' object has no attribute 'sc' Pyspark PMML
AttributeError: 'str' object has no attribute 'sc' Pyspark PMML
第一次post来这里!我正在尝试通过 pyspark2pmml 保存我的逻辑回归模型。但是我不断收到标题中所述的错误。我将 post 我的管道和模型代码。
from pyspark.ml.feature import Binarizer
binarizer = Binarizer(threshold=10000, inputCol="traffic_count", outputCol="label")
stages = []
stages = [binarizer]
from pyspark.ml import Pipeline
from pyspark.ml.feature import StringIndexer, OneHotEncoder, VectorAssembler
SI_roadname = StringIndexer(inputCol='road_name',outputCol='road_Index')
SI_suburb = StringIndexer(inputCol='suburb',outputCol='suburb_Index')
SI_cardinal = StringIndexer(inputCol='cardinal_direction_name',outputCol='cardinal_Index')
SI_period = StringIndexer(inputCol='period',outputCol='period_Index')
SI_label = StringIndexer(inputCol='label',outputCol='label_index')
stages = []
stages += [SI_roadname, SI_suburb, SI_cardinal, SI_period, SI_label]
OHE = OneHotEncoder(inputCols['road_Index','suburb_Index','cardinal_Index','period_Index','label_index'],outputCols=['road_OHE','suburb_OHE','cardinal_OHE','period_OHE','label_OHE'])
stages += [OHE]
assembler = VectorAssembler(inputCols=['wgs84_latitude','wgs84_longitude'],outputCol='features')
stages += [assembler]
pipeline = Pipeline(stages=stages)
pipelineModel = pipeline.fit(df)
model = pipelineModel.transform(df)
from pyspark.ml.linalg import DenseVector
input_data = model.rdd.map(lambda x: (x["label"], DenseVector(x["features"])))
df_train = sqlContext.createDataFrame(input_data, ["label", "features"])
train, test = df_train.randomSplit([0.7,0.3])
lr = LogisticRegression(labelCol='label')
lr_model = lr.fit(train)
pred_labels = lr_model.evaluate(test)
pred_labels.predictions.show()
所以我得到的特定错误来自这一行
from pyspark2pmml import PMMLBuilder
PMMLBuilder(spark, df, pipelineModel)
PMMLBuilder.buildFile("lr_model.pmml","path")
一般来说,我对使用 Pyspark 还很陌生,所以我希望有人能帮帮我。我也会 post 一些屏幕截图作为上下文。
The Dataframe
The error
model.take(1)
Predictions
您正在用字符串“lr_model.pmml”覆盖对象的 self 参数。这就是您收到错误 AttributeError: 'str' object has no attribute 'sc' Pyspark PMML
的原因。
您必须调用 buildFile 传递路径作为参数,see.
def buildFile(self, path):
javaFile = self.sc._jvm.java.io.File(path)
javaFile = self.javaPmmlBuilder.buildFile(javaFile)
return javaFile.getAbsolutePath()
来自库的自述文件:
from pyspark2pmml import PMMLBuilder
pmmlBuilder = PMMLBuilder(sc, df, pipelineModel)
pmmlBuilder.buildFile("DecisionTreeIris.pmml")
第一次post来这里!我正在尝试通过 pyspark2pmml 保存我的逻辑回归模型。但是我不断收到标题中所述的错误。我将 post 我的管道和模型代码。
from pyspark.ml.feature import Binarizer
binarizer = Binarizer(threshold=10000, inputCol="traffic_count", outputCol="label")
stages = []
stages = [binarizer]
from pyspark.ml import Pipeline
from pyspark.ml.feature import StringIndexer, OneHotEncoder, VectorAssembler
SI_roadname = StringIndexer(inputCol='road_name',outputCol='road_Index')
SI_suburb = StringIndexer(inputCol='suburb',outputCol='suburb_Index')
SI_cardinal = StringIndexer(inputCol='cardinal_direction_name',outputCol='cardinal_Index')
SI_period = StringIndexer(inputCol='period',outputCol='period_Index')
SI_label = StringIndexer(inputCol='label',outputCol='label_index')
stages = []
stages += [SI_roadname, SI_suburb, SI_cardinal, SI_period, SI_label]
OHE = OneHotEncoder(inputCols['road_Index','suburb_Index','cardinal_Index','period_Index','label_index'],outputCols=['road_OHE','suburb_OHE','cardinal_OHE','period_OHE','label_OHE'])
stages += [OHE]
assembler = VectorAssembler(inputCols=['wgs84_latitude','wgs84_longitude'],outputCol='features')
stages += [assembler]
pipeline = Pipeline(stages=stages)
pipelineModel = pipeline.fit(df)
model = pipelineModel.transform(df)
from pyspark.ml.linalg import DenseVector
input_data = model.rdd.map(lambda x: (x["label"], DenseVector(x["features"])))
df_train = sqlContext.createDataFrame(input_data, ["label", "features"])
train, test = df_train.randomSplit([0.7,0.3])
lr = LogisticRegression(labelCol='label')
lr_model = lr.fit(train)
pred_labels = lr_model.evaluate(test)
pred_labels.predictions.show()
所以我得到的特定错误来自这一行
from pyspark2pmml import PMMLBuilder
PMMLBuilder(spark, df, pipelineModel)
PMMLBuilder.buildFile("lr_model.pmml","path")
一般来说,我对使用 Pyspark 还很陌生,所以我希望有人能帮帮我。我也会 post 一些屏幕截图作为上下文。
The Dataframe
The error
model.take(1)
Predictions
您正在用字符串“lr_model.pmml”覆盖对象的 self 参数。这就是您收到错误 AttributeError: 'str' object has no attribute 'sc' Pyspark PMML
的原因。
您必须调用 buildFile 传递路径作为参数,see.
def buildFile(self, path):
javaFile = self.sc._jvm.java.io.File(path)
javaFile = self.javaPmmlBuilder.buildFile(javaFile)
return javaFile.getAbsolutePath()
来自库的自述文件:
from pyspark2pmml import PMMLBuilder
pmmlBuilder = PMMLBuilder(sc, df, pipelineModel)
pmmlBuilder.buildFile("DecisionTreeIris.pmml")