如何修复 - ArrowInvalid:(“无法将 (x, y) 转换为元组类型)?
How to fix - ArrowInvalid: ("Could not convert (x, y) with type tuple)?
我正在尝试构建一个 streamlit 应用程序。下面是我正在尝试的工作示例
from pm4py.objects.conversion.log import converter as log_converter
import pandas as pd
import pm4py
df = pm4py.format_dataframe(pd.read_csv('https://raw.githubusercontent.com/pm4py/pm4py-core/release/notebooks/data/running_example.csv', sep=';'), case_id='case_id',activity_key='activity',
timestamp_key='timestamp')
log = log_converter.apply(df)
precedence_dict = pm4py.discover_dfg(log)[0]
precedence_dict 是 (antecedent,consequent) 和 count
的字典
precedence_dict = {('check ticket', 'decide'): 6,
('check ticket', 'examine casually'): 2,
('check ticket', 'examine thoroughly'): 1,
('decide', 'pay compensation'): 3,
('decide', 'reinitiate request'): 3,
('decide', 'reject request'): 3,
('examine casually', 'check ticket'): 4,
('examine casually', 'decide'): 2,
('examine thoroughly', 'check ticket'): 2,
('examine thoroughly', 'decide'): 1,
('register request', 'check ticket'): 2,
('register request', 'examine casually'): 3,
('register request', 'examine thoroughly'): 1,
('reinitiate request', 'check ticket'): 1,
('reinitiate request', 'examine casually'): 1,
('reinitiate request', 'examine thoroughly'): 1
}
将上面的 dict
转换为 pandas 数据帧
precedence_df = pd.DataFrame.from_dict(precedence_dict, orient='index').reset_index()
rename_map = {"index" : "Antecedent,Consequent", 0 : "Count"}
precedence_df = precedence_df.rename(columns=rename_map)
precedence_df['Antecedent'], precedence_df['Consequent'] = zip(*precedence_df["Antecedent,Consequent"])
# precedence_df.assign(**dict(zip(['Antecedent', 'Consequent'], precedence_df["Antecedent,Consequent"].str)))
# precedence_df['Antecedent'], precedence_df['Consequent'] = precedence_df["Antecedent,Consequent"].str
precedence_mat = precedence_df[['Antecedent', 'Consequent', 'Count']]
st.dataframe(precedence_df)
我在 运行 在此行
运行应用程序时遇到 ArrowInvalid 错误
完成错误回溯
ArrowInvalid: ("Could not convert (x, y) with type tuple: did not recognize Python value type when inferring an Arrow data type", 'Conversion failed for column Antecedent, Consequent with type object')
Traceback:
File "C:\Users\zz\Documents\Streamlit\preced\app.py", line 1353, in <module>
st.dataframe(precedence_df)
File "c:\users\zz\anaconda3\lib\site-packages\streamlit\elements\dataframe_selector.py", line 85, in dataframe
return self.dg._arrow_dataframe(data, width, height)
File "c:\users\zz\anaconda3\lib\site-packages\streamlit\elements\arrow.py", line 82, in _arrow_dataframe
marshall(proto, data, default_uuid)
File "c:\users\zz\anaconda3\lib\site-packages\streamlit\elements\arrow.py", line 160, in marshall
proto.data = type_util.data_frame_to_bytes(df)
File "c:\users\zz\anaconda3\lib\site-packages\streamlit\type_util.py", line 371, in data_frame_to_bytes
table = pa.Table.from_pandas(df)
File "pyarrow\table.pxi", line 1561, in pyarrow.lib.Table.from_pandas
File "c:\users\zz\anaconda3\lib\site-packages\pyarrow\pandas_compat.py", line 595, in dataframe_to_arrays
for c, f in zip(columns_to_convert, convert_fields)]
File "c:\users\zz\anaconda3\lib\site-packages\pyarrow\pandas_compat.py", line 595, in <listcomp>
for c, f in zip(columns_to_convert, convert_fields)]
File "c:\users\zz\anaconda3\lib\site-packages\pyarrow\pandas_compat.py", line 581, in convert_column
raise e
File "c:\users\zz\anaconda3\lib\site-packages\pyarrow\pandas_compat.py", line 575, in convert_column
result = pa.array(col, type=type_, from_pandas=True, safe=safe)
File "pyarrow\array.pxi", line 302, in pyarrow.lib.array
File "pyarrow\array.pxi", line 83, in pyarrow.lib._ndarray_to_array
File "pyarrow\error.pxi", line 99, in pyarrow.lib.check_status
当前 pyarrow 版本 5.0.0.
我没有任何 ISSUES/ERRORS 当我尝试 运行 colab 中的相同代码时(预计 st.dataframe),我没有任何 issues/errors . ArrowInavlid Error 是什么意思,如何解决这个错误?
我对 streamlit 和 st.dataframe
不太熟悉,但看起来它正在尝试将 precedence_df
转换为 pyarrow.Table
。
虽然在 pandas
中您可以将任意对象作为列的数据类型,但在 pyarrow
中这是不可能的。所以列 Antecedent,Consequent
引起了问题,因为它是一个元组。
| | Antecedent,Consequent | Count |
|---:|:---------------------------------------------|--------:|
| 0 | ('check ticket', 'decide') | 6 |
| 1 | ('check ticket', 'examine casually') | 2 |
| 2 | ('check ticket', 'examine thoroughly') | 1 |
| 3 | ('decide', 'pay compensation') | 3 |
| 4 | ('decide', 'reinitiate request') | 3 |
| 5 | ('decide', 'reject request') | 3 |
处理像 precedence_mat
这样的数据框更容易也更惯用,因为它使用扁平字符串列(而不是元组)。
| | Antecedent | Consequent | Count |
|---:|:-------------------|:-------------------|--------:|
| 0 | check ticket | decide | 6 |
| 1 | check ticket | examine casually | 2 |
| 2 | check ticket | examine thoroughly | 1 |
| 3 | decide | pay compensation | 3 |
| 4 | decide | reinitiate request | 3 |
话虽如此,如果您真的需要将元组传递给 pyarrow/streamlit,您有两个选择:
- 为您的元组创建一个模式,并在将数据帧传递给 streamlit 之前使用它将数据帧转换为 pyarrow。
这有点棘手,您需要为您的元组提供一个架构,解释它们是什么:
import pyarrow as pa
schema = pa.schema(
[
pa.field(
"Antecedent,Consequent",
pa.struct(
[
pa.field("antecedent", pa.string()),
pa.field("consequent", pa.string()),
])
),
pa.field("Count", pa.int32())
]
)
table = pa.Table.from_pandas(precedence_df, schema=schema)
st.dataframe(table)
- 将元组转换为列表,
这使得 pyarrow 更容易猜测类型
copy_df = precedence_df.copy()
copy_df["Antecedent,Consequent"] = precedence_df["Antecedent,Consequent"].apply(list)
table = pa.Table.from_pandas(copy_df).to_pandas()
st.dataframe(table)
请注意,在这种情况下,“Antecedent,Consequent”数据从字符串元组转换为字符串列表。
我正在尝试构建一个 streamlit 应用程序。下面是我正在尝试的工作示例
from pm4py.objects.conversion.log import converter as log_converter
import pandas as pd
import pm4py
df = pm4py.format_dataframe(pd.read_csv('https://raw.githubusercontent.com/pm4py/pm4py-core/release/notebooks/data/running_example.csv', sep=';'), case_id='case_id',activity_key='activity',
timestamp_key='timestamp')
log = log_converter.apply(df)
precedence_dict = pm4py.discover_dfg(log)[0]
precedence_dict 是 (antecedent,consequent) 和 count
的字典precedence_dict = {('check ticket', 'decide'): 6,
('check ticket', 'examine casually'): 2,
('check ticket', 'examine thoroughly'): 1,
('decide', 'pay compensation'): 3,
('decide', 'reinitiate request'): 3,
('decide', 'reject request'): 3,
('examine casually', 'check ticket'): 4,
('examine casually', 'decide'): 2,
('examine thoroughly', 'check ticket'): 2,
('examine thoroughly', 'decide'): 1,
('register request', 'check ticket'): 2,
('register request', 'examine casually'): 3,
('register request', 'examine thoroughly'): 1,
('reinitiate request', 'check ticket'): 1,
('reinitiate request', 'examine casually'): 1,
('reinitiate request', 'examine thoroughly'): 1
}
将上面的 dict
转换为 pandas 数据帧
precedence_df = pd.DataFrame.from_dict(precedence_dict, orient='index').reset_index()
rename_map = {"index" : "Antecedent,Consequent", 0 : "Count"}
precedence_df = precedence_df.rename(columns=rename_map)
precedence_df['Antecedent'], precedence_df['Consequent'] = zip(*precedence_df["Antecedent,Consequent"])
# precedence_df.assign(**dict(zip(['Antecedent', 'Consequent'], precedence_df["Antecedent,Consequent"].str)))
# precedence_df['Antecedent'], precedence_df['Consequent'] = precedence_df["Antecedent,Consequent"].str
precedence_mat = precedence_df[['Antecedent', 'Consequent', 'Count']]
st.dataframe(precedence_df)
我在 运行 在此行
运行应用程序时遇到 ArrowInvalid 错误完成错误回溯
ArrowInvalid: ("Could not convert (x, y) with type tuple: did not recognize Python value type when inferring an Arrow data type", 'Conversion failed for column Antecedent, Consequent with type object')
Traceback:
File "C:\Users\zz\Documents\Streamlit\preced\app.py", line 1353, in <module>
st.dataframe(precedence_df)
File "c:\users\zz\anaconda3\lib\site-packages\streamlit\elements\dataframe_selector.py", line 85, in dataframe
return self.dg._arrow_dataframe(data, width, height)
File "c:\users\zz\anaconda3\lib\site-packages\streamlit\elements\arrow.py", line 82, in _arrow_dataframe
marshall(proto, data, default_uuid)
File "c:\users\zz\anaconda3\lib\site-packages\streamlit\elements\arrow.py", line 160, in marshall
proto.data = type_util.data_frame_to_bytes(df)
File "c:\users\zz\anaconda3\lib\site-packages\streamlit\type_util.py", line 371, in data_frame_to_bytes
table = pa.Table.from_pandas(df)
File "pyarrow\table.pxi", line 1561, in pyarrow.lib.Table.from_pandas
File "c:\users\zz\anaconda3\lib\site-packages\pyarrow\pandas_compat.py", line 595, in dataframe_to_arrays
for c, f in zip(columns_to_convert, convert_fields)]
File "c:\users\zz\anaconda3\lib\site-packages\pyarrow\pandas_compat.py", line 595, in <listcomp>
for c, f in zip(columns_to_convert, convert_fields)]
File "c:\users\zz\anaconda3\lib\site-packages\pyarrow\pandas_compat.py", line 581, in convert_column
raise e
File "c:\users\zz\anaconda3\lib\site-packages\pyarrow\pandas_compat.py", line 575, in convert_column
result = pa.array(col, type=type_, from_pandas=True, safe=safe)
File "pyarrow\array.pxi", line 302, in pyarrow.lib.array
File "pyarrow\array.pxi", line 83, in pyarrow.lib._ndarray_to_array
File "pyarrow\error.pxi", line 99, in pyarrow.lib.check_status
当前 pyarrow 版本 5.0.0.
我没有任何 ISSUES/ERRORS 当我尝试 运行 colab 中的相同代码时(预计 st.dataframe),我没有任何 issues/errors . ArrowInavlid Error 是什么意思,如何解决这个错误?
我对 streamlit 和 st.dataframe
不太熟悉,但看起来它正在尝试将 precedence_df
转换为 pyarrow.Table
。
虽然在 pandas
中您可以将任意对象作为列的数据类型,但在 pyarrow
中这是不可能的。所以列 Antecedent,Consequent
引起了问题,因为它是一个元组。
| | Antecedent,Consequent | Count |
|---:|:---------------------------------------------|--------:|
| 0 | ('check ticket', 'decide') | 6 |
| 1 | ('check ticket', 'examine casually') | 2 |
| 2 | ('check ticket', 'examine thoroughly') | 1 |
| 3 | ('decide', 'pay compensation') | 3 |
| 4 | ('decide', 'reinitiate request') | 3 |
| 5 | ('decide', 'reject request') | 3 |
处理像 precedence_mat
这样的数据框更容易也更惯用,因为它使用扁平字符串列(而不是元组)。
| | Antecedent | Consequent | Count |
|---:|:-------------------|:-------------------|--------:|
| 0 | check ticket | decide | 6 |
| 1 | check ticket | examine casually | 2 |
| 2 | check ticket | examine thoroughly | 1 |
| 3 | decide | pay compensation | 3 |
| 4 | decide | reinitiate request | 3 |
话虽如此,如果您真的需要将元组传递给 pyarrow/streamlit,您有两个选择:
- 为您的元组创建一个模式,并在将数据帧传递给 streamlit 之前使用它将数据帧转换为 pyarrow。
这有点棘手,您需要为您的元组提供一个架构,解释它们是什么:
import pyarrow as pa
schema = pa.schema(
[
pa.field(
"Antecedent,Consequent",
pa.struct(
[
pa.field("antecedent", pa.string()),
pa.field("consequent", pa.string()),
])
),
pa.field("Count", pa.int32())
]
)
table = pa.Table.from_pandas(precedence_df, schema=schema)
st.dataframe(table)
- 将元组转换为列表, 这使得 pyarrow 更容易猜测类型
copy_df = precedence_df.copy()
copy_df["Antecedent,Consequent"] = precedence_df["Antecedent,Consequent"].apply(list)
table = pa.Table.from_pandas(copy_df).to_pandas()
st.dataframe(table)
请注意,在这种情况下,“Antecedent,Consequent”数据从字符串元组转换为字符串列表。