将 astropy table 到 pandas DataFrame 转换为 hdf 文件时出错

Error when transforming astropy table to pandas DataFrame to hdf file

我正在尝试从 Gaia 目录中获取一些数据,然后将 astropy table 转换为 pandas DataFrame,然后我想将其存储在 hdf5 文件中。我不能直接将 astropy table(查询结果)存储到 hdf5 文件中,因为我需要对其进行一些处理。

问题是当我想将 DataFrame 存储到 hdf 文件时出现此错误:

Traceback (most recent call last):
  File "C:/Users/Administrateur.UTILISA-D5U7HV7/Documents/MEGA/ipsa/cours/aero4/stage/working_directory/python/tests/Whosebug_issue/1_panda_to_hdf/tohdf.py", line 8, in <module>
    pd_table.to_hdf("test.h5", key="test", format='table', data_columns=True, mode="w", encoding="utf-8")
  File "C:\Users\Administrateur.UTILISA-D5U7HV7\Documents\MEGA\ipsa\cours\aero4\stage\working_directory\python\venv\lib\site-packages\pandas\core\generic.py", line 2505, in to_hdf
    encoding=encoding,
  File "C:\Users\Administrateur.UTILISA-D5U7HV7\Documents\MEGA\ipsa\cours\aero4\stage\working_directory\python\venv\lib\site-packages\pandas\io\pytables.py", line 282, in to_hdf
    f(store)
  File "C:\Users\Administrateur.UTILISA-D5U7HV7\Documents\MEGA\ipsa\cours\aero4\stage\working_directory\python\venv\lib\site-packages\pandas\io\pytables.py", line 274, in <lambda>
    encoding=encoding,
  File "C:\Users\Administrateur.UTILISA-D5U7HV7\Documents\MEGA\ipsa\cours\aero4\stage\working_directory\python\venv\lib\site-packages\pandas\io\pytables.py", line 1042, in put
    errors=errors,
  File "C:\Users\Administrateur.UTILISA-D5U7HV7\Documents\MEGA\ipsa\cours\aero4\stage\working_directory\python\venv\lib\site-packages\pandas\io\pytables.py", line 1709, in _write_to_group
    data_columns=data_columns,
  File "C:\Users\Administrateur.UTILISA-D5U7HV7\Documents\MEGA\ipsa\cours\aero4\stage\working_directory\python\venv\lib\site-packages\pandas\io\pytables.py", line 4143, in write
    data_columns=data_columns,
  File "C:\Users\Administrateur.UTILISA-D5U7HV7\Documents\MEGA\ipsa\cours\aero4\stage\working_directory\python\venv\lib\site-packages\pandas\io\pytables.py", line 3813, in _create_axes
    errors=self.errors,
  File "C:\Users\Administrateur.UTILISA-D5U7HV7\Documents\MEGA\ipsa\cours\aero4\stage\working_directory\python\venv\lib\site-packages\pandas\io\pytables.py", line 4800, in _maybe_convert_for_string_atom
    for i in range(len(block.shape[0])):
TypeError: object of type 'int' has no len()


我最初以为是我的计算造成了问题,但即使没有它,我也会收到错误。

这是我的代码:

from astroquery.gaia import Gaia

job3 = Gaia.launch_job_async("SELECT * \
FROM gaiadr1.gaia_source \
WHERE CONTAINS(POINT('ICRS',gaiadr1.gaia_source.ra,gaiadr1.gaia_source.dec),CIRCLE('ICRS',56.75,24.1167,2))=1 \
AND abs(pmra_error/pmra)<0.10 \
AND abs(pmdec_error/pmdec)<0.10 \
AND pmra IS NOT NULL AND abs(pmra)>0 \
AND pmdec IS NOT NULL AND abs(pmdec)>0 \
AND pmra BETWEEN 15 AND 25 \
AND pmdec BETWEEN -55 AND -40;", dump_to_file=True)
print(job3)
p = job3.get_results()

from astropy.table import Table
import pandas as pd

table = Table.read("async_20200611171019.vot", format='votable')

pd_table = table.to_pandas()
print(pd_table)
pd_table.to_hdf("test.h5", key="test", format='table', data_columns=True, mode="w", encoding="utf-8")

hdf_table = pd.DataFrame(pd.read_hdf("test.h5"))
print(hdf_table)

有谁知道这个问题可能出在哪里?谢谢!

看起来 phot_variable_flag 列具有对象数据类型,即它是一个 numpy 对象数组。它也被屏蔽了:

In [30]: table['phot_variable_flag'].dtype                                                                                                                    
Out[30]: dtype('O')

In [31]: type(table['phot_variable_flag'])                                                                                                                    
Out[31]: astropy.table.column.MaskedColumn

当我删除该列时,它被 pandas 成功写入为 HDF5。