在 H2O(深度学习)中交叉验证后未对齐的预测和响应列
Misalign predictions and the response column after crossvalidation in H2O (Deep Learning)
我一直对深度学习模型有疑问。我有一个在 rrc 数据框架上训练的模型,如果我这样做:
rrc['preds'] = dp.cross_validation_holdout_predictions().as_data_frame().predict
我总是错位响应列和预测。在数据框的顶部有对齐,但在某些时候它们似乎未对齐,如果我计算它们之间的相关性非常糟糕,因为这种未对齐。我已经尝试修复此问题 3 天多了,但我不知道该怎么做。
我正在使用 H2O 3.10.4.5。
模型本身:
dp = H2ODeepLearningEstimator(activation = "Tanh", hidden = [10, 10, 10], epochs = 10000,
keep_cross_validation_predictions=True,
ignored_columns = ['fn', 'pdb_id','pdb_id_chain', 'pdb_id_chain_source', 'source'])
dp.train(x = list(set(rrch.col_names) - set(['rmsd_all'])), y ="rmsd_all", training_frame = rrch,
fold_column="cv")
编辑:我想我发现了问题(单元格 #58)https://github.com/mmagnus/mmagnus.github.io/blob/master/mq-test.ipynb If I do rrc3 = rrc3[rrc3.rmsd_all < 10]
to remove some rows that rmsd_all (the response column) value is higher than 10 and then I do rrc3h = h2o.H2OFrame(rrc3)
caused the problem. I'm not sure why though. The dataset, 40mb https://www.dropbox.com/s/1et38o3xx47jw1m/rasp_rnakb_cv2.csv?dl=0
已解决:rrc3.reset_index(inplace=True)
会完成任务!
我一直对深度学习模型有疑问。我有一个在 rrc 数据框架上训练的模型,如果我这样做:
rrc['preds'] = dp.cross_validation_holdout_predictions().as_data_frame().predict
我总是错位响应列和预测。在数据框的顶部有对齐,但在某些时候它们似乎未对齐,如果我计算它们之间的相关性非常糟糕,因为这种未对齐。我已经尝试修复此问题 3 天多了,但我不知道该怎么做。
我正在使用 H2O 3.10.4.5。 模型本身:
dp = H2ODeepLearningEstimator(activation = "Tanh", hidden = [10, 10, 10], epochs = 10000,
keep_cross_validation_predictions=True,
ignored_columns = ['fn', 'pdb_id','pdb_id_chain', 'pdb_id_chain_source', 'source'])
dp.train(x = list(set(rrch.col_names) - set(['rmsd_all'])), y ="rmsd_all", training_frame = rrch,
fold_column="cv")
编辑:我想我发现了问题(单元格 #58)https://github.com/mmagnus/mmagnus.github.io/blob/master/mq-test.ipynb If I do rrc3 = rrc3[rrc3.rmsd_all < 10]
to remove some rows that rmsd_all (the response column) value is higher than 10 and then I do rrc3h = h2o.H2OFrame(rrc3)
caused the problem. I'm not sure why though. The dataset, 40mb https://www.dropbox.com/s/1et38o3xx47jw1m/rasp_rnakb_cv2.csv?dl=0
已解决:rrc3.reset_index(inplace=True)
会完成任务!