H2O GAM - 加权:预测不再有效
H2O GAM - weighted: prediction does not work anymore
如果我训练加权 H2O GAM 回归模型,我无法用它进行预测。使用参数 weights_column
完成加权回归
我是运行python=3.6.13,h2o=3.32.1.3,pandas=0.25.3,numpy=1.19.5,sklearn=0.24.2。 Java版本:openjdk版本“14.0.2”。
预测适用于:
- 未加权的 H2O GAM
- 加权 H2O GLM
- 降级到 h2o 时加权 H2O GAM=3.32.0.5
我已将此注册为 http://jira.h2o.ai 上的错误,但如果有人有办法让它工作而不降级 h2o,我仍然会感兴趣。
import numpy as np
import pandas as pd
from sklearn.datasets import load_boston
import h2o
from h2o.estimators.gam import H2OGeneralizedAdditiveEstimator
h2o.no_progress()
h2o.init()
np.random.seed(42)
boston = load_boston()
y = pd.Series(boston["target"], name="y")
X = pd.DataFrame(boston["data"], columns=boston["feature_names"]) # shape: (506, 13)
myweight = pd.Series(np.random.random_sample((len(y),)), name="myweight2")
predictors = ['CRIM', 'AGE']
gam_columns = ['CRIM']
params = {
"family": "gaussian",
"gam_columns": gam_columns,
'bs': len(gam_columns) * [0],
}
df0 = pd.concat([y, X, myweight], axis=1)
df = h2o.H2OFrame(python_obj=df0)
model = H2OGeneralizedAdditiveEstimator(**params)
model.train(
x=predictors,
y="y",
weights_column="myweight2",
training_frame=df,
)
print('df.shape', df.shape)
y_pred = model.predict(df)
print('y_pred:', y_pred.as_data_frame()["predict"].values[0:5])
我得到了这个输出。它抱怨 myweight2:
Checking whether there is an H2O instance running at http://localhost:54321 . connected.
-------------------------- ------------------------------------------
df.shape (506, 15)
Traceback (most recent call last):
File "/Users/g009655/tmp7/h2otest/test_gam_predict.py", line 37, in <module>
y_pred = model.predict(df)
File "/Users/g009655/Library/Caches/pypoetry/virtualenvs/h2otest-S7Xak4Mg-py3.6/lib/python3.6/site-packages/h2o/model/model_base.py", line 237, in predict
j.poll()
File "/Users/g009655/Library/Caches/pypoetry/virtualenvs/h2otest-S7Xak4Mg-py3.6/lib/python3.6/site-packages/h2o/job.py", line 80, in poll
"\n{}".format(self.job_key, self.exception, self.job["stacktrace"]))
OSError: Job with key 017f00000132d4ffffffff$_9242dd1b28497090cf9ccad52bd54b9f failed with an exception: java.lang.AssertionError: null vec: ff0f000000ffffffff$_b0f0839f8f1a041e8bf5254b552e4dd3;
name: myweight2
stacktrace:
java.lang.AssertionError: null vec: ff0f000000ffffffff$_b0f0839f8f1a041e8bf5254b552e4dd3;
name: myweight2
at water.fvec.Frame.<init>(Frame.java:161)
at hex.gam.GAMModel.cleanUpInputFrame(GAMModel.java:505)
at hex.gam.GAMModel.adaptTestForTrain(GAMModel.java:492)
at hex.Model.score(Model.java:1697)
at water.api.ModelMetricsHandler.compute2(ModelMetricsHandler.java:422)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1637)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
Closing connection _sid_ad95 at exit
H2O session _sid_ad95 closed.
Process finished with exit code 1
感谢您的错误报告。这里有一张link的Jira票据,供参考。
我遇到同样的错误,但我找到了解决方法。对于我重新加载(在我的情况下来自 pandas.DataFrame
),培训 H2OFrame
有效。似乎在训练中它以某种方式损坏了...
对于你的情况,尝试:
df = h2o.H2OFrame(python_obj=df0)
y_pred = model.predict(df)
如果我训练加权 H2O GAM 回归模型,我无法用它进行预测。使用参数 weights_column
完成加权回归我是运行python=3.6.13,h2o=3.32.1.3,pandas=0.25.3,numpy=1.19.5,sklearn=0.24.2。 Java版本:openjdk版本“14.0.2”。
预测适用于:
- 未加权的 H2O GAM
- 加权 H2O GLM
- 降级到 h2o 时加权 H2O GAM=3.32.0.5
我已将此注册为 http://jira.h2o.ai 上的错误,但如果有人有办法让它工作而不降级 h2o,我仍然会感兴趣。
import numpy as np
import pandas as pd
from sklearn.datasets import load_boston
import h2o
from h2o.estimators.gam import H2OGeneralizedAdditiveEstimator
h2o.no_progress()
h2o.init()
np.random.seed(42)
boston = load_boston()
y = pd.Series(boston["target"], name="y")
X = pd.DataFrame(boston["data"], columns=boston["feature_names"]) # shape: (506, 13)
myweight = pd.Series(np.random.random_sample((len(y),)), name="myweight2")
predictors = ['CRIM', 'AGE']
gam_columns = ['CRIM']
params = {
"family": "gaussian",
"gam_columns": gam_columns,
'bs': len(gam_columns) * [0],
}
df0 = pd.concat([y, X, myweight], axis=1)
df = h2o.H2OFrame(python_obj=df0)
model = H2OGeneralizedAdditiveEstimator(**params)
model.train(
x=predictors,
y="y",
weights_column="myweight2",
training_frame=df,
)
print('df.shape', df.shape)
y_pred = model.predict(df)
print('y_pred:', y_pred.as_data_frame()["predict"].values[0:5])
我得到了这个输出。它抱怨 myweight2:
Checking whether there is an H2O instance running at http://localhost:54321 . connected.
-------------------------- ------------------------------------------
df.shape (506, 15)
Traceback (most recent call last):
File "/Users/g009655/tmp7/h2otest/test_gam_predict.py", line 37, in <module>
y_pred = model.predict(df)
File "/Users/g009655/Library/Caches/pypoetry/virtualenvs/h2otest-S7Xak4Mg-py3.6/lib/python3.6/site-packages/h2o/model/model_base.py", line 237, in predict
j.poll()
File "/Users/g009655/Library/Caches/pypoetry/virtualenvs/h2otest-S7Xak4Mg-py3.6/lib/python3.6/site-packages/h2o/job.py", line 80, in poll
"\n{}".format(self.job_key, self.exception, self.job["stacktrace"]))
OSError: Job with key 017f00000132d4ffffffff$_9242dd1b28497090cf9ccad52bd54b9f failed with an exception: java.lang.AssertionError: null vec: ff0f000000ffffffff$_b0f0839f8f1a041e8bf5254b552e4dd3;
name: myweight2
stacktrace:
java.lang.AssertionError: null vec: ff0f000000ffffffff$_b0f0839f8f1a041e8bf5254b552e4dd3;
name: myweight2
at water.fvec.Frame.<init>(Frame.java:161)
at hex.gam.GAMModel.cleanUpInputFrame(GAMModel.java:505)
at hex.gam.GAMModel.adaptTestForTrain(GAMModel.java:492)
at hex.Model.score(Model.java:1697)
at water.api.ModelMetricsHandler.compute2(ModelMetricsHandler.java:422)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1637)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
Closing connection _sid_ad95 at exit
H2O session _sid_ad95 closed.
Process finished with exit code 1
感谢您的错误报告。这里有一张link的Jira票据,供参考。
我遇到同样的错误,但我找到了解决方法。对于我重新加载(在我的情况下来自 pandas.DataFrame
),培训 H2OFrame
有效。似乎在训练中它以某种方式损坏了...
对于你的情况,尝试:
df = h2o.H2OFrame(python_obj=df0)
y_pred = model.predict(df)