将 astropy table 到 pandas DataFrame 转换为 hdf 文件时出错
Error when transforming astropy table to pandas DataFrame to hdf file
我正在尝试从 Gaia 目录中获取一些数据,然后将 astropy table 转换为 pandas DataFrame,然后我想将其存储在 hdf5 文件中。我不能直接将 astropy table(查询结果)存储到 hdf5 文件中,因为我需要对其进行一些处理。
问题是当我想将 DataFrame 存储到 hdf 文件时出现此错误:
Traceback (most recent call last):
File "C:/Users/Administrateur.UTILISA-D5U7HV7/Documents/MEGA/ipsa/cours/aero4/stage/working_directory/python/tests/Whosebug_issue/1_panda_to_hdf/tohdf.py", line 8, in <module>
pd_table.to_hdf("test.h5", key="test", format='table', data_columns=True, mode="w", encoding="utf-8")
File "C:\Users\Administrateur.UTILISA-D5U7HV7\Documents\MEGA\ipsa\cours\aero4\stage\working_directory\python\venv\lib\site-packages\pandas\core\generic.py", line 2505, in to_hdf
encoding=encoding,
File "C:\Users\Administrateur.UTILISA-D5U7HV7\Documents\MEGA\ipsa\cours\aero4\stage\working_directory\python\venv\lib\site-packages\pandas\io\pytables.py", line 282, in to_hdf
f(store)
File "C:\Users\Administrateur.UTILISA-D5U7HV7\Documents\MEGA\ipsa\cours\aero4\stage\working_directory\python\venv\lib\site-packages\pandas\io\pytables.py", line 274, in <lambda>
encoding=encoding,
File "C:\Users\Administrateur.UTILISA-D5U7HV7\Documents\MEGA\ipsa\cours\aero4\stage\working_directory\python\venv\lib\site-packages\pandas\io\pytables.py", line 1042, in put
errors=errors,
File "C:\Users\Administrateur.UTILISA-D5U7HV7\Documents\MEGA\ipsa\cours\aero4\stage\working_directory\python\venv\lib\site-packages\pandas\io\pytables.py", line 1709, in _write_to_group
data_columns=data_columns,
File "C:\Users\Administrateur.UTILISA-D5U7HV7\Documents\MEGA\ipsa\cours\aero4\stage\working_directory\python\venv\lib\site-packages\pandas\io\pytables.py", line 4143, in write
data_columns=data_columns,
File "C:\Users\Administrateur.UTILISA-D5U7HV7\Documents\MEGA\ipsa\cours\aero4\stage\working_directory\python\venv\lib\site-packages\pandas\io\pytables.py", line 3813, in _create_axes
errors=self.errors,
File "C:\Users\Administrateur.UTILISA-D5U7HV7\Documents\MEGA\ipsa\cours\aero4\stage\working_directory\python\venv\lib\site-packages\pandas\io\pytables.py", line 4800, in _maybe_convert_for_string_atom
for i in range(len(block.shape[0])):
TypeError: object of type 'int' has no len()
我最初以为是我的计算造成了问题,但即使没有它,我也会收到错误。
这是我的代码:
- 首先,您需要通过 运行ning 获取包含查询结果的文件:(查询可能需要几分钟)
from astroquery.gaia import Gaia
job3 = Gaia.launch_job_async("SELECT * \
FROM gaiadr1.gaia_source \
WHERE CONTAINS(POINT('ICRS',gaiadr1.gaia_source.ra,gaiadr1.gaia_source.dec),CIRCLE('ICRS',56.75,24.1167,2))=1 \
AND abs(pmra_error/pmra)<0.10 \
AND abs(pmdec_error/pmdec)<0.10 \
AND pmra IS NOT NULL AND abs(pmra)>0 \
AND pmdec IS NOT NULL AND abs(pmdec)>0 \
AND pmra BETWEEN 15 AND 25 \
AND pmdec BETWEEN -55 AND -40;", dump_to_file=True)
print(job3)
p = job3.get_results()
- 然后你可以运行下面的代码,它会显示上面的错误。请注意
Table.read()
函数中的文件名,因为查询不会给出与下面示例中相同的名称。
from astropy.table import Table
import pandas as pd
table = Table.read("async_20200611171019.vot", format='votable')
pd_table = table.to_pandas()
print(pd_table)
pd_table.to_hdf("test.h5", key="test", format='table', data_columns=True, mode="w", encoding="utf-8")
hdf_table = pd.DataFrame(pd.read_hdf("test.h5"))
print(hdf_table)
有谁知道这个问题可能出在哪里?谢谢!
看起来 phot_variable_flag
列具有对象数据类型,即它是一个 numpy 对象数组。它也被屏蔽了:
In [30]: table['phot_variable_flag'].dtype
Out[30]: dtype('O')
In [31]: type(table['phot_variable_flag'])
Out[31]: astropy.table.column.MaskedColumn
当我删除该列时,它被 pandas 成功写入为 HDF5。
我正在尝试从 Gaia 目录中获取一些数据,然后将 astropy table 转换为 pandas DataFrame,然后我想将其存储在 hdf5 文件中。我不能直接将 astropy table(查询结果)存储到 hdf5 文件中,因为我需要对其进行一些处理。
问题是当我想将 DataFrame 存储到 hdf 文件时出现此错误:
Traceback (most recent call last):
File "C:/Users/Administrateur.UTILISA-D5U7HV7/Documents/MEGA/ipsa/cours/aero4/stage/working_directory/python/tests/Whosebug_issue/1_panda_to_hdf/tohdf.py", line 8, in <module>
pd_table.to_hdf("test.h5", key="test", format='table', data_columns=True, mode="w", encoding="utf-8")
File "C:\Users\Administrateur.UTILISA-D5U7HV7\Documents\MEGA\ipsa\cours\aero4\stage\working_directory\python\venv\lib\site-packages\pandas\core\generic.py", line 2505, in to_hdf
encoding=encoding,
File "C:\Users\Administrateur.UTILISA-D5U7HV7\Documents\MEGA\ipsa\cours\aero4\stage\working_directory\python\venv\lib\site-packages\pandas\io\pytables.py", line 282, in to_hdf
f(store)
File "C:\Users\Administrateur.UTILISA-D5U7HV7\Documents\MEGA\ipsa\cours\aero4\stage\working_directory\python\venv\lib\site-packages\pandas\io\pytables.py", line 274, in <lambda>
encoding=encoding,
File "C:\Users\Administrateur.UTILISA-D5U7HV7\Documents\MEGA\ipsa\cours\aero4\stage\working_directory\python\venv\lib\site-packages\pandas\io\pytables.py", line 1042, in put
errors=errors,
File "C:\Users\Administrateur.UTILISA-D5U7HV7\Documents\MEGA\ipsa\cours\aero4\stage\working_directory\python\venv\lib\site-packages\pandas\io\pytables.py", line 1709, in _write_to_group
data_columns=data_columns,
File "C:\Users\Administrateur.UTILISA-D5U7HV7\Documents\MEGA\ipsa\cours\aero4\stage\working_directory\python\venv\lib\site-packages\pandas\io\pytables.py", line 4143, in write
data_columns=data_columns,
File "C:\Users\Administrateur.UTILISA-D5U7HV7\Documents\MEGA\ipsa\cours\aero4\stage\working_directory\python\venv\lib\site-packages\pandas\io\pytables.py", line 3813, in _create_axes
errors=self.errors,
File "C:\Users\Administrateur.UTILISA-D5U7HV7\Documents\MEGA\ipsa\cours\aero4\stage\working_directory\python\venv\lib\site-packages\pandas\io\pytables.py", line 4800, in _maybe_convert_for_string_atom
for i in range(len(block.shape[0])):
TypeError: object of type 'int' has no len()
我最初以为是我的计算造成了问题,但即使没有它,我也会收到错误。
这是我的代码:
- 首先,您需要通过 运行ning 获取包含查询结果的文件:(查询可能需要几分钟)
from astroquery.gaia import Gaia
job3 = Gaia.launch_job_async("SELECT * \
FROM gaiadr1.gaia_source \
WHERE CONTAINS(POINT('ICRS',gaiadr1.gaia_source.ra,gaiadr1.gaia_source.dec),CIRCLE('ICRS',56.75,24.1167,2))=1 \
AND abs(pmra_error/pmra)<0.10 \
AND abs(pmdec_error/pmdec)<0.10 \
AND pmra IS NOT NULL AND abs(pmra)>0 \
AND pmdec IS NOT NULL AND abs(pmdec)>0 \
AND pmra BETWEEN 15 AND 25 \
AND pmdec BETWEEN -55 AND -40;", dump_to_file=True)
print(job3)
p = job3.get_results()
- 然后你可以运行下面的代码,它会显示上面的错误。请注意
Table.read()
函数中的文件名,因为查询不会给出与下面示例中相同的名称。
from astropy.table import Table
import pandas as pd
table = Table.read("async_20200611171019.vot", format='votable')
pd_table = table.to_pandas()
print(pd_table)
pd_table.to_hdf("test.h5", key="test", format='table', data_columns=True, mode="w", encoding="utf-8")
hdf_table = pd.DataFrame(pd.read_hdf("test.h5"))
print(hdf_table)
有谁知道这个问题可能出在哪里?谢谢!
看起来 phot_variable_flag
列具有对象数据类型,即它是一个 numpy 对象数组。它也被屏蔽了:
In [30]: table['phot_variable_flag'].dtype
Out[30]: dtype('O')
In [31]: type(table['phot_variable_flag'])
Out[31]: astropy.table.column.MaskedColumn
当我删除该列时,它被 pandas 成功写入为 HDF5。