Merging two H20 dataframes Error: DistributedException 'Operation not allowed on string vector.'
Merging two H20 dataframes Error: DistributedException 'Operation not allowed on string vector.'
我想在一次操作后合并两个数据帧。
import pandas as pd
import h2o
from h2o.automl import H2OAutoML
h2o.init()
import pandas as pd
import numpy as np
support = "splvl.csv"
data = h2o.import_file(support)
df1 = data[data['X'] == 0]
df2 = data[data['X'] == 1]
df1.impute("A", method = "mean", by = ["B", "C"])
df1.impute("Q", method = "mode", by = ["B", "C"])
df2.impute("A", method = "mean", by = ["B", "C"])
df2.impute("Q", method = "mode", by = ["B", "C"])
df1["X"].table()
df2["X"].table()
df3 = df2.merge(df1)
h2o.export_file(df3, path = "merged.csv", force=True, parts=1)
当我执行导出到 CSV 命令时出现以下错误,
H2OServerError: HTTP 500 Server Error:
Server error water.util.DistributedException:
Error: DistributedException from /127.0.0.1:54321: 'Operation not allowed on string vector.'
Request: None
df3["X"].table()
Server error water.exceptions.H2OKeyNotFoundArgumentException:
Error: Object 'py_13_sid_95bb' not found for argument: key
Request: GET /3/Frames/py_13_sid_95bb
params: {'row_count': '10', 'row_offset': '0', 'column_count': '-1', 'full_column_count': '-1', 'column_offset': '0'}
当我尝试打印合并数据框中的值计数时出现此错误
第一个问题很可能是你的"X"
是字符串类型,你可以通过运行df1["X"].types
检查一下。您可以将其转换为因子列,这样您就可以通过执行 df1["X"]=df1["X"].asfactor()
.
来使用 table()
您看到第二个错误的原因可能是因为 d3
在您 运行 df3 = df2.merge(df1)
.
时创建失败
我会建议验证您的列数据类型,修复那些需要转换为因子的类型,然后再次尝试合并。
我想在一次操作后合并两个数据帧。
import pandas as pd
import h2o
from h2o.automl import H2OAutoML
h2o.init()
import pandas as pd
import numpy as np
support = "splvl.csv"
data = h2o.import_file(support)
df1 = data[data['X'] == 0]
df2 = data[data['X'] == 1]
df1.impute("A", method = "mean", by = ["B", "C"])
df1.impute("Q", method = "mode", by = ["B", "C"])
df2.impute("A", method = "mean", by = ["B", "C"])
df2.impute("Q", method = "mode", by = ["B", "C"])
df1["X"].table()
df2["X"].table()
df3 = df2.merge(df1)
h2o.export_file(df3, path = "merged.csv", force=True, parts=1)
当我执行导出到 CSV 命令时出现以下错误,
H2OServerError: HTTP 500 Server Error: Server error water.util.DistributedException: Error: DistributedException from /127.0.0.1:54321: 'Operation not allowed on string vector.' Request: None
df3["X"].table()
Server error water.exceptions.H2OKeyNotFoundArgumentException: Error: Object 'py_13_sid_95bb' not found for argument: key Request: GET /3/Frames/py_13_sid_95bb params: {'row_count': '10', 'row_offset': '0', 'column_count': '-1', 'full_column_count': '-1', 'column_offset': '0'}
当我尝试打印合并数据框中的值计数时出现此错误
第一个问题很可能是你的"X"
是字符串类型,你可以通过运行df1["X"].types
检查一下。您可以将其转换为因子列,这样您就可以通过执行 df1["X"]=df1["X"].asfactor()
.
table()
您看到第二个错误的原因可能是因为 d3
在您 运行 df3 = df2.merge(df1)
.
我会建议验证您的列数据类型,修复那些需要转换为因子的类型,然后再次尝试合并。