摄取 HDF5 数据：键内键，将数据导入数组

Question

我需要一些帮助。

我有一个包含频谱数据（时间、频率和给定时间给定频率的功率电平）的 HDF5 文件。以下是文件的结构（使用 HDFView）：

HDFView of Data File

主要的组（键）是HOURS，然后里面是分钟，每一分钟都是自己的组（键）。数据在 0.02 秒收集 60 秒......所以有 3000 行......频率仓为 256（即从 1MHz 开始到 26MHz 结束，间隔为 256 个间距。对于 example.e。23 --> 23:10 --> 幂的二维数组

0           0           1           2       .....           255
1       -53.672386  -53.82235   -53.773468  .....        -50.566887
2       -53.85694   -53.945183  -53.63385   .....        -51.306465   
3       -53.709038  -53.55101   -53.55305   .....        -52.7324906
.
.
.
2999    -53.23989   -51.501495  -50.681602              -52.227474

我能够访问单个分钟数据并将它们拉入数组，然后绘制数据。像这样：

import h5py
import numpy as np
import matplotlib.pyplot as plt

# Read in the HDF5 file
file = h5py.File("/home/tom/Desktop/2021-10-28_ch0.hdf5", 'r')

# Search for the main groups in the file. The main groups are hours: 20, 22, etc...

# Select one of the hours (i.e. 23)
hour = file['23']

# Search for the subgroups (keys) within the chosen hour. There are "hour:minutes" i.e. 23:10
#for key in hour.keys():
    #print( key )

# Select key with data for minutes 10, 11, 12, 13 and save into individual arrays:
minute_data_10=hour['23:10'][()]
minute_data_11=hour['23:11'][()]
minute_data_12=hour['23:12'][()]
minute_data_13=hour['23:13'][()]

# Generate a 1D array of TIME spanning 4 minutes (because we ingested
# 4x 1 minute slices of data:

time = np.linspace(0, 60*4, 3000*4)

# Generate a 1D array of FREQUENCY
frequency = np.linspace(1.575E0, 26.82402336E0, 256)

# Combine minute_data_10  minute_data_11 minute_data_12 and minute_data_13 along the time axis (axis=0)
comb_min = np.concatenate( (minute_data_10, minute_data_11, minute_data_12, minute_data_13), axis=0 )

print( comb_min.shape )

# Plot the data
im = plt.pcolormesh(frequency, time, comb_min, cmap='jet')
plt.colorbar(im).ax.tick_params(labelsize=10)
plt.title('Spectrum')
plt.ylabel('Seconds ago...')
#plt.xlabel('frequency in Hz')
im.axes.xaxis.set_ticklabels([])
plt.show()

Spectrum Plotted

我正在手动定义每一分钟（第 10、11、12、13 分钟）组合它们然后绘制它们。

但是......我想做的是自动摄取我选择的所有小时的所有分钟，然后将其绘制成一个图。例如，我如何摄取第 15 小时的所有分钟然后绘制频谱？或者，我如何绘制前 5 小时的数据？

Answer 1

它只是概括了您已有的内容。要获得第 15 小时的全部时间：

hh = '15'
hour = file[hh]

collect = [hour[f'{hh}:{mm:02d}'] for mm in range(60)]
comb_min = np.concatenate( collect, axis=0 )
print( comb_min.shape )

获得前 5 个小时：

collect = [hours[f'{hh:02d}:{mm:02d}'] for hh in range(5) for mm in range(60)]
comb_min = np.concatenate( collect, axis=0 )
print( comb_min.shape )

看起来 for 子句是倒退的，但事实并非如此。

Answer 2

HDF5 文件是自描述的。（换句话说，您可以从文件中获取组或数据集名称——您不必事先知道它们。）如上所述，您可以使用 .keys() 方法来完成此操作。（注意：h5py 对象不是字典；h5py 只是使用 Python 的 字典语法 来访问名称。）

使用 keys/names 具有仅读取现有数据集的额外好处。查看您的图像，有时间 15:00 和 15:02 的数据集，但没有 15:01 的数据集。（这个差距在创建你的情节时有额外的影响 --- 但这是一个不同的问题。）

下面的代码显示了如何执行此操作。它使用相同的方法：创建一个 h5py 对象列表，然后使用 np.concatenate 组合成一个数组。它还在可用于创建 time 数组的列表中收集 hh:mm 次（来自数据集名称）。

我使用了 Python 的文件上下文管理器。这优于 open/close 方法（避免让文件保持打开状态，并提高可读性）。

简单示例（针对 ['15'] 小时组进行硬编码）：

with h5py.File('/home/tom/Desktop/2021-10-28_ch0.hdf5.h5','r') as h5f:  
    times = []
    collect = []
    hh = '15'
    for hhmm in h5f[hh].keys():
        times.append(hhmm)
        collect.append(h5f[hh][hhmm])
    
    comb_min = np.concatenate( collect, axis=0 )
    print(times)
    print(len(collect), comb_min.shape)

更一般的例子（读取所有组[小时]和数据集['hh:mm']）：

with h5py.File('/home/tom/Desktop/2021-10-28_ch0.hdf5.h5','r') as h5f:  
    times = []
    collect = []
    for hh in h5f.keys():
        for hhmm in h5f[hh].keys():
            times.append(hhmm)
            collect.append(h5f[hh][hhmm])
    
    comb_min = np.concatenate( collect, axis=0 )
    print(times)
    print( len(collect), comb_min.shape )

摄取 HDF5 数据：键内键，将数据导入数组

Ingesting HDF5 data: keys within keys, getting data into arrays

python

hdf5

h5py