如何使用 PyTable 遍历列名?
How to iterate over column names with PyTable?
我有一个使用 PyTables 存储的大矩阵(15000 行 x 2500 列)并了解如何迭代一行的列。在 documentation 中,我只看到如何按名称手动访问每一行。
我有这样的列:
- ID
- X20160730_Day10_123a_2
- X20160730_Day10_123b_1
- X20160730_Day10_123b_2
ID 列值是类似于“10692.RFX7”的字符串,但所有其他单元格值都是浮点数。此选择有效,我可以迭代结果行,但我看不到如何迭代列并检查它们的值:
from tables import *
import numpy
def main():
h5file = open_file('carlo_seth.h5', mode='r', title='Three-file test')
table = h5file.root.expression.readout
condition = '(ID == b"10692.RFX7")'
for row in table.where(condition):
print(row['ID'].decode())
for col in row.fetch_all_fields():
print("{0}\t{1}".format(col, row[col]))
h5file.close()
if __name__ == '__main__':
main()
如果我只是用 "for col in row" 进行迭代,什么也不会发生。正如上面的代码,我得到了一个堆栈:
10692.RFX7
Traceback (most recent call last):
File "tables/tableextension.pyx", line 1497, in tables.tableextension.Row.__getitem__ (tables/tableextension.c:17226)
KeyError: b'10692.RFX7'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "tables/tableextension.pyx", line 126, in tables.tableextension.get_nested_field_cache (tables/tableextension.c:2532)
KeyError: b'10692.RFX7'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./read_carlo_pytable.py", line 31, in <module>
main()
File "./read_carlo_pytable.py", line 25, in main
print("{0}\t{1}".format(col, row[col]))
File "tables/tableextension.pyx", line 1501, in tables.tableextension.Row.__getitem__ (tables/tableextension.c:17286)
File "tables/tableextension.pyx", line 133, in tables.tableextension.get_nested_field_cache (tables/tableextension.c:2651)
File "tables/utilsextension.pyx", line 927, in tables.utilsextension.get_nested_field (tables/utilsextension.c:8707)
AttributeError: 'numpy.bytes_' object has no attribute 'encode'
Closing remaining open files:carlo_seth.h5...done
您可以在每一行中按名称访问列值:
for row in table:
print(row["10692.RFX7"])
遍历所有列:
names = table.coldescrs.keys()
for row in table:
for name in names:
print(name, row[name])
我有一个使用 PyTables 存储的大矩阵(15000 行 x 2500 列)并了解如何迭代一行的列。在 documentation 中,我只看到如何按名称手动访问每一行。
我有这样的列:
- ID
- X20160730_Day10_123a_2
- X20160730_Day10_123b_1
- X20160730_Day10_123b_2
ID 列值是类似于“10692.RFX7”的字符串,但所有其他单元格值都是浮点数。此选择有效,我可以迭代结果行,但我看不到如何迭代列并检查它们的值:
from tables import *
import numpy
def main():
h5file = open_file('carlo_seth.h5', mode='r', title='Three-file test')
table = h5file.root.expression.readout
condition = '(ID == b"10692.RFX7")'
for row in table.where(condition):
print(row['ID'].decode())
for col in row.fetch_all_fields():
print("{0}\t{1}".format(col, row[col]))
h5file.close()
if __name__ == '__main__':
main()
如果我只是用 "for col in row" 进行迭代,什么也不会发生。正如上面的代码,我得到了一个堆栈:
10692.RFX7
Traceback (most recent call last):
File "tables/tableextension.pyx", line 1497, in tables.tableextension.Row.__getitem__ (tables/tableextension.c:17226)
KeyError: b'10692.RFX7'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "tables/tableextension.pyx", line 126, in tables.tableextension.get_nested_field_cache (tables/tableextension.c:2532)
KeyError: b'10692.RFX7'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./read_carlo_pytable.py", line 31, in <module>
main()
File "./read_carlo_pytable.py", line 25, in main
print("{0}\t{1}".format(col, row[col]))
File "tables/tableextension.pyx", line 1501, in tables.tableextension.Row.__getitem__ (tables/tableextension.c:17286)
File "tables/tableextension.pyx", line 133, in tables.tableextension.get_nested_field_cache (tables/tableextension.c:2651)
File "tables/utilsextension.pyx", line 927, in tables.utilsextension.get_nested_field (tables/utilsextension.c:8707)
AttributeError: 'numpy.bytes_' object has no attribute 'encode'
Closing remaining open files:carlo_seth.h5...done
您可以在每一行中按名称访问列值:
for row in table:
print(row["10692.RFX7"])
遍历所有列:
names = table.coldescrs.keys()
for row in table:
for name in names:
print(name, row[name])