Merging two H20 dataframes Error: DistributedException 'Operation not allowed on string vector.'

Merging two H20 dataframes Error: DistributedException 'Operation not allowed on string vector.'

我想在一次操作后合并两个数据帧。

import pandas as pd
import h2o
from h2o.automl import H2OAutoML
h2o.init()
import pandas as pd

import numpy as np



support = "splvl.csv"
data = h2o.import_file(support)

df1 = data[data['X'] == 0]
df2 = data[data['X'] == 1]

df1.impute("A", method = "mean", by = ["B", "C"])
df1.impute("Q", method = "mode", by = ["B", "C"])

df2.impute("A", method = "mean", by = ["B", "C"])
df2.impute("Q", method = "mode", by = ["B", "C"])

df1["X"].table()
df2["X"].table()

df3 = df2.merge(df1)

h2o.export_file(df3, path = "merged.csv", force=True, parts=1)

当我执行导出到 CSV 命令时出现以下错误,

H2OServerError: HTTP 500 Server Error: Server error water.util.DistributedException: Error: DistributedException from /127.0.0.1:54321: 'Operation not allowed on string vector.' Request: None

df3["X"].table()

Server error water.exceptions.H2OKeyNotFoundArgumentException: Error: Object 'py_13_sid_95bb' not found for argument: key Request: GET /3/Frames/py_13_sid_95bb params: {'row_count': '10', 'row_offset': '0', 'column_count': '-1', 'full_column_count': '-1', 'column_offset': '0'}

当我尝试打印合并数据框中的值计数时出现此错误

第一个问题很可能是你的"X"是字符串类型,你可以通过运行df1["X"].types检查一下。您可以将其转换为因子列,这样您就可以通过执行 df1["X"]=df1["X"].asfactor().

来使用 table()

您看到第二个错误的原因可能是因为 d3 在您 运行 df3 = df2.merge(df1).

时创建失败

我会建议验证您的列数据类型,修复那些需要转换为因子的类型,然后再次尝试合并。