加载 ML PySpark 模型失败
Fail loading a ML PySpark model
我有几个无法加载的回归模型。
这是 Spark 初始化:
from pyspark.sql import SparkSession, SQLContext
from pyspark.ml.regression import DecisionTreeRegressor
spark = SparkSession.builder \
.appName("Linear Regression Model") \
.config('spark.executor.cores','2') \
.config("spark.executor.memory", "5gb") \
.master("local[*]") \
.getOrCreate()
sc = spark.sparkContext
这里是模型拟合保存成功:
# Decision Tree Regression
decisionTree = DecisionTreeRegressor(featuresCol = "Features", labelCol = "SalePrice", maxDepth = 15, maxBins = 32)
decisionTreeModel = decisionTree.fit(train_vector)
import os
decisionTreeModel.save(os.path.join(".", 'decisionTreeModel'))
但是当我加载它时:
persistedModel = DecisionTreeRegressor.load("decisionTreeModel")
我得到这个错误:
Py4JJavaError: An error occurred while calling o1201.load.
: java.lang.NoSuchMethodException: org.apache.spark.ml.regression.DecisionTreeRegressionModel.<init>(java.lang.String)
at java.lang.Class.getConstructor0(Class.java:3082)
at java.lang.Class.getConstructor(Class.java:1825)
at org.apache.spark.ml.util.DefaultParamsReader.load(ReadWrite.scala:468)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
有人知道如何加载 PySpark 模型吗?
错误消息不是很有帮助,但我认为加载模型的正确方法是调用模型的 load
方法,而不是估计器的方法。该模型已经与数据拟合,这与估计器不同,后者仅包含 settings/parameters,但未拟合。
所以你可以试试这个:
from pyspark.ml.regression import DecisionTreeRegressionModel
persistedModel = DecisionTreeRegressionModel.load("decisionTreeModel")
供您参考,这是加载估算器与加载模型的比较:
from pyspark.ml.regression import DecisionTreeRegressor, DecisionTreeRegressionModel
decisionTree = DecisionTreeRegressor(featuresCol = "Features", labelCol = "SalePrice", maxDepth = 15, maxBins = 32)
decisionTree.save('tree')
persistedEstimator = DecisionTreeRegressor.load('tree')
decisionTreeModel = decisionTree.fit(df)
decisionTreeModel.save('model')
persistedModel = DecisionTreeRegressionModel.load('model')
我有几个无法加载的回归模型。 这是 Spark 初始化:
from pyspark.sql import SparkSession, SQLContext
from pyspark.ml.regression import DecisionTreeRegressor
spark = SparkSession.builder \
.appName("Linear Regression Model") \
.config('spark.executor.cores','2') \
.config("spark.executor.memory", "5gb") \
.master("local[*]") \
.getOrCreate()
sc = spark.sparkContext
这里是模型拟合保存成功:
# Decision Tree Regression
decisionTree = DecisionTreeRegressor(featuresCol = "Features", labelCol = "SalePrice", maxDepth = 15, maxBins = 32)
decisionTreeModel = decisionTree.fit(train_vector)
import os
decisionTreeModel.save(os.path.join(".", 'decisionTreeModel'))
但是当我加载它时:
persistedModel = DecisionTreeRegressor.load("decisionTreeModel")
我得到这个错误:
Py4JJavaError: An error occurred while calling o1201.load.
: java.lang.NoSuchMethodException: org.apache.spark.ml.regression.DecisionTreeRegressionModel.<init>(java.lang.String)
at java.lang.Class.getConstructor0(Class.java:3082)
at java.lang.Class.getConstructor(Class.java:1825)
at org.apache.spark.ml.util.DefaultParamsReader.load(ReadWrite.scala:468)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
有人知道如何加载 PySpark 模型吗?
错误消息不是很有帮助,但我认为加载模型的正确方法是调用模型的 load
方法,而不是估计器的方法。该模型已经与数据拟合,这与估计器不同,后者仅包含 settings/parameters,但未拟合。
所以你可以试试这个:
from pyspark.ml.regression import DecisionTreeRegressionModel
persistedModel = DecisionTreeRegressionModel.load("decisionTreeModel")
供您参考,这是加载估算器与加载模型的比较:
from pyspark.ml.regression import DecisionTreeRegressor, DecisionTreeRegressionModel
decisionTree = DecisionTreeRegressor(featuresCol = "Features", labelCol = "SalePrice", maxDepth = 15, maxBins = 32)
decisionTree.save('tree')
persistedEstimator = DecisionTreeRegressor.load('tree')
decisionTreeModel = decisionTree.fit(df)
decisionTreeModel.save('model')
persistedModel = DecisionTreeRegressionModel.load('model')