将 parquet 文件转换为 pandas 然后查询给出错误

Question

我正在尝试查询数据帧以获取列的平均值，并且我将镶木地板文件转换为 pandas 来执行此操作。我收到错误 TypeError('Could not convert %s to numeric' % str(x)) ，它似乎指的是列中的单词 "Age" 。

数据框如下所示：

         _c0     _c1  _c2    
    0  RecId   Class  Age   
    1      1    1st    29   
    2      2    1st     2   
    3      3    1st    30

我的代码是：

    import pyarrow 
    import pandas
    import pyarrow.parquet as pq

    df = pq.read_table("file.parquet").to_pandas()
    average_age = df["_c2"].mean()

我试过使用

    df = df(skiprows=1)

但这给出了错误 "TypeError: 'DataFrame' object is not callable"

如何跳过其中包含 "Age" 的行或将其删除，这是否与从镶木地板文件中读取它有关，或者这是一个直接的 Pandas 问题？

Answer 1

您可以只使用 pandas 索引删除第一行：

df = df.iloc[1:,:]

将 parquet 文件转换为 pandas 然后查询给出错误

converting parquet file to pandas and then querying gives error

python

pandas

parquet