pytables 中父子关系的推荐方式
Recommended way for child-parent relationship in pytables
我正在使用 pytables,我正在尝试实现父子关系。例如,我想存储多支球队,每支球队有多名球员。我可以通过以下方式进行:
import tables as tb
class Team(tb.IsDescription):
id = tb.Int32Col() #Id of team
name = tb.StringCol(20) #Name of team
class Player(tb.IsDescription):
team = tb.Int32Col() #Link to team::team_id
name = tb.StringCol(20) #Name of player
f = tb.open_file('test.h5',mode='w',title='test')
table_team = f.create_table(f.root,'teams',Team)
table_player = f.create_table(f.root,'players',Player)
team = table_team.row
team['id'] = 0
team['name'] = 'Barcelona'
team.append()
player0 = table_player.row
player0['team'] = 0
player0['name'] = 'De Jong'
player0.append()
player1 = table_player.row
player1['team'] = 0
player1['name'] = 'Fati'
player1.append()
f.close()
但是,pytables 文档对此有如下说明 (https://www.pytables.org/cookbook/hints_for_sql_users.html):
"You may have noticed that queries in PyTables only cover one table.
In fact, there is no way of directly performing a join between two
tables in PyTables (remember that it’s not a relational database)."
然后继续为连接查询提供一些解决方法。然而,正如他们所说,pytables 不是关系数据库。因此,我没有使用基于关系的方法和变通方法,而是有以下问题:
在pytables中实现父子结构的recommended/standard方式是什么?
您的用例是否需要父子关系?我认为 HDF5 分层数据结构将组织您的实验数据。为每个实验创建一个不同的 Table,将行作为数据点。实验元数据作为属性存储在每个 table.
上
我用“虚拟数据”创建了一个简单示例来演示此架构。
注意:为简单起见,我更喜欢使用 NumPy 来创建 Table。首先,我创建了一个数据类型 (exp_dt
),然后使用它来创建基线“实验数据”作为 NumPy recarrry (exp_arr
)。 Table 数据是通过修改 exp_arr
中的时间和压力值创建第二个数组 (data
)。我使用 obj=data
参数将 data
加载到每个 table 中。可以修改示例以创建 class Experiment(tb.IsDescription)
并逐行加载数据。
代码如下:
# define table structure with NumPy dtype
exp_dt = np.dtype( [ ('Time',float),('Temp',float),('Pres',float) ] )
# create baseline dummy data (used later)
exp_arr = np.empty(shape=(11,), dtype=exp_dt)
for i in range(11):
exp_arr[i]['Time'] = i/10.
exp_arr[i]['Temp'] = i**2/10.
exp_arr[i]['Pres'] = 2.*i
# create empty recarray; used to load experimental data
data = np.empty(shape=(11,), dtype=exp_dt)
# create some metadata for experiment date, time and device
date_list = ['11/17/2021','11/19/2021','11/23/2021']
time_list = ['10:49:23', '08:14:25', '14:40:23' ]
device_list = ['Hex 6500', 'Hex 4414', 'CMM 6950']
with tb.File('SO_70082470.h5','w') as h5f:
for i in range(1,4):
# create dummy data for THIS experiment
data['Time'] = exp_arr['Time']
data['Temp'] = exp_arr['Temp'] + i
data['Pres'] = exp_arr['Pres'] + 2.*i
# create table and load data
tbl = h5f.create_table('/', f'Experiment_{i:03}', obj=data)
# add 3 attributes: Date, Time and Device:
tbl.attrs['Date'] = date_list[i-1]
tbl.attrs['Time'] = time_list[i-1]
tbl.attrs['Device'] = device_list[i-1]
我正在使用 pytables,我正在尝试实现父子关系。例如,我想存储多支球队,每支球队有多名球员。我可以通过以下方式进行:
import tables as tb
class Team(tb.IsDescription):
id = tb.Int32Col() #Id of team
name = tb.StringCol(20) #Name of team
class Player(tb.IsDescription):
team = tb.Int32Col() #Link to team::team_id
name = tb.StringCol(20) #Name of player
f = tb.open_file('test.h5',mode='w',title='test')
table_team = f.create_table(f.root,'teams',Team)
table_player = f.create_table(f.root,'players',Player)
team = table_team.row
team['id'] = 0
team['name'] = 'Barcelona'
team.append()
player0 = table_player.row
player0['team'] = 0
player0['name'] = 'De Jong'
player0.append()
player1 = table_player.row
player1['team'] = 0
player1['name'] = 'Fati'
player1.append()
f.close()
但是,pytables 文档对此有如下说明 (https://www.pytables.org/cookbook/hints_for_sql_users.html):
"You may have noticed that queries in PyTables only cover one table. In fact, there is no way of directly performing a join between two tables in PyTables (remember that it’s not a relational database)."
然后继续为连接查询提供一些解决方法。然而,正如他们所说,pytables 不是关系数据库。因此,我没有使用基于关系的方法和变通方法,而是有以下问题:
在pytables中实现父子结构的recommended/standard方式是什么?
您的用例是否需要父子关系?我认为 HDF5 分层数据结构将组织您的实验数据。为每个实验创建一个不同的 Table,将行作为数据点。实验元数据作为属性存储在每个 table.
上我用“虚拟数据”创建了一个简单示例来演示此架构。
注意:为简单起见,我更喜欢使用 NumPy 来创建 Table。首先,我创建了一个数据类型 (exp_dt
),然后使用它来创建基线“实验数据”作为 NumPy recarrry (exp_arr
)。 Table 数据是通过修改 exp_arr
中的时间和压力值创建第二个数组 (data
)。我使用 obj=data
参数将 data
加载到每个 table 中。可以修改示例以创建 class Experiment(tb.IsDescription)
并逐行加载数据。
代码如下:
# define table structure with NumPy dtype
exp_dt = np.dtype( [ ('Time',float),('Temp',float),('Pres',float) ] )
# create baseline dummy data (used later)
exp_arr = np.empty(shape=(11,), dtype=exp_dt)
for i in range(11):
exp_arr[i]['Time'] = i/10.
exp_arr[i]['Temp'] = i**2/10.
exp_arr[i]['Pres'] = 2.*i
# create empty recarray; used to load experimental data
data = np.empty(shape=(11,), dtype=exp_dt)
# create some metadata for experiment date, time and device
date_list = ['11/17/2021','11/19/2021','11/23/2021']
time_list = ['10:49:23', '08:14:25', '14:40:23' ]
device_list = ['Hex 6500', 'Hex 4414', 'CMM 6950']
with tb.File('SO_70082470.h5','w') as h5f:
for i in range(1,4):
# create dummy data for THIS experiment
data['Time'] = exp_arr['Time']
data['Temp'] = exp_arr['Temp'] + i
data['Pres'] = exp_arr['Pres'] + 2.*i
# create table and load data
tbl = h5f.create_table('/', f'Experiment_{i:03}', obj=data)
# add 3 attributes: Date, Time and Device:
tbl.attrs['Date'] = date_list[i-1]
tbl.attrs['Time'] = time_list[i-1]
tbl.attrs['Device'] = device_list[i-1]