fastparquet error when saving pandas df to parquet: AttributeError: module 'fastparquet.parquet_thrift' has no attribute 'SchemaElement
fastparquet error when saving pandas df to parquet: AttributeError: module 'fastparquet.parquet_thrift' has no attribute 'SchemaElement
import pandas as pd
from flatten_json import flatten
actual_column_list = ["_id", "external_id", "email", "created_at","updated_at", "dob.timestamp", "dob_1.timestamp","column_10"]
data = [{'_id': '60efe3333333445', 'external_id': 'ID2', 'dob': {'timestamp': 412214400}, 'email': 'woofwoof@gmail.com', 'created_at': 1626334203, 'updated_at': 1629338900},
{ 'external_id': 'ID3', '_id': '60efe3333333487', 'email': 'meowmeow@gmail.com', 'created_at': 1626334203, 'updated_at': 1629338900, 'dob_1': {'timestamp': 'oops'}}]
df = pd.DataFrame(data=[flatten(row, ".") for row in data], dtype='str', columns=actual_column_list)
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(df)
df.to_parquet(f"test.parquet", engine='fastparquet', compression="snappy", index=False)
错误显示:
root = parquet_thrift.SchemaElement(name=b'schema',
AttributeError: module 'fastparquet.parquet_thrift' has no attribute 'SchemaElement'
Python Version : 3.6.9
pyarrow=5.0.0
fastparquet=0.8.0
numpy=1.19.5
pandas=1.1.5. Tried the exact code snippet with
Python Version : 3.7.13
pyarrow=7.0.0
fastparquet=0.8.0
numpy=1.21.5
pandas=1.3.5 and it worked but need I need it to work with Python Version : 3.6.9
Tried to explicitly use these versions in python 3.6.9 but it failed to install the dependencies.
我想要的是让上面的代码片段兼容python 3.6.9
使用fastparquet 0.7.2
尽管 fastparquet 0.8.0 与 python 3.6 兼容,但看起来它需要一个大于 5.0.0 的 pyarrow 版本才能正常运行。因此必须将 fastparquet 降级到 0.7.2 才能与 pyarrow 5.0.0
兼容
注意:此代码片段可用于获取所有字符串列 parquet 以及具有空数据类型的列,当列为空时不会将列转换为浮点数,这是使用 pandas 时的默认行为使用 pyarrow 将数据框保存到 parquet
import pandas as pd
from flatten_json import flatten
actual_column_list = ["_id", "external_id", "email", "created_at","updated_at", "dob.timestamp", "dob_1.timestamp","column_10"]
data = [{'_id': '60efe3333333445', 'external_id': 'ID2', 'dob': {'timestamp': 412214400}, 'email': 'woofwoof@gmail.com', 'created_at': 1626334203, 'updated_at': 1629338900},
{ 'external_id': 'ID3', '_id': '60efe3333333487', 'email': 'meowmeow@gmail.com', 'created_at': 1626334203, 'updated_at': 1629338900, 'dob_1': {'timestamp': 'oops'}}]
df = pd.DataFrame(data=[flatten(row, ".") for row in data], dtype='str', columns=actual_column_list)
with pd.option_context('display.max_rows', None, 'display.max_columns', None):
print(df)
df.to_parquet(f"test.parquet", engine='fastparquet', compression="snappy", index=False)
错误显示:
root = parquet_thrift.SchemaElement(name=b'schema',
AttributeError: module 'fastparquet.parquet_thrift' has no attribute 'SchemaElement'
Python Version : 3.6.9 pyarrow=5.0.0 fastparquet=0.8.0 numpy=1.19.5 pandas=1.1.5. Tried the exact code snippet with Python Version : 3.7.13 pyarrow=7.0.0 fastparquet=0.8.0 numpy=1.21.5 pandas=1.3.5 and it worked but need I need it to work with Python Version : 3.6.9 Tried to explicitly use these versions in python 3.6.9 but it failed to install the dependencies.
我想要的是让上面的代码片段兼容python 3.6.9
使用fastparquet 0.7.2 尽管 fastparquet 0.8.0 与 python 3.6 兼容,但看起来它需要一个大于 5.0.0 的 pyarrow 版本才能正常运行。因此必须将 fastparquet 降级到 0.7.2 才能与 pyarrow 5.0.0
兼容注意:此代码片段可用于获取所有字符串列 parquet 以及具有空数据类型的列,当列为空时不会将列转换为浮点数,这是使用 pandas 时的默认行为使用 pyarrow 将数据框保存到 parquet