R 中的 rhdf5 包和数组 - 存储模式列表与双精度

Question

我正在使用包 rhdf5 构建一个包含特定地理区域气候数据的大型 h5。

域在 space 中的维度为 48x47 (lonxlat) 点。气候变量（降水量、温度...）以 2256 行 (48*47=2256) 和 248 列（每月 31 天为 8 observation/day）的矩阵组织。

为了满足目标模型的要求，我需要将 h5 数据集构建为 (time, lon, lat) (248,48,47) 形式。为此，我将观察矩阵转换为维度 c(48,47,248)（经度、纬度、时间）的数组，然后使用命令 'aperm' 切换维度的顺序。

但是，当我在 h5 文件中写入数据集时，我收到以下消息： "Writing of this type of data not supported."

这里是我使用的代码：

# load package from bioconductor
require(rhdf5)

setwd("path/to/file")

lon <-read.csv("lon_h5.csv", header=FALSE)
lon <-as.matrix(lon) #matrix 48x47
lat <-read.csv("lat_h5.csv", header=FALSE)
lat<-as.matrix(lat) #matrix 48x47

h5createFile("file.h5")
h5createDataset("file.h5", "lon",c(48,47), storage.mode = "double")
h5createDataset("file.h5", "lat",c(48,47), storage.mode = "double")
h5write(lon, file="file.h5", name="lon")
h5write(lat, file="file.h5", name="lat")

tmp <-read.csv(file="temperature.csv", header=TRUE)
tmp = array(tmp,dim=c(48,47,248)) # it loops the 48 longitude points first, then the 47 latitude points, then 248 time steps
tmp = aperm(a=tmp,perm=c(3,1,2)) # switch the order of the dimensions, putting time first, then longitude, then latitude
h5createDataset("file.h5", "tmp",c(248,48,47), storage.mode = "double")
h5write(tmp, file="file.h5", name="tmp")

'Writing of this type of data not supported.'

数组有559488个元素（48*47*248），应该不是维数的问题。

我写矩阵没问题，比如lon和lat矩阵。有人知道包 rhdf5 是否有数组问题吗？

谢谢

更新：显然这个问题与数组有一个 'list' 存储模式有关，它没有在 rhdf5 包中实现。

有人建议我使用

更改tmp的存储模式

storage.mode(tmp)="double"

但这行不通（错误 storage.mode(tmp) = "double" : (list) object cannot be coerced to type 'double'）。我试过了

tmp<-as.numeric(unlist(tmp))

但这会将我的数组的维度从 559488 个元素更改为 1262204908 (!!!) 个元素。其他建议？谢谢

Answer 1

如果有人对这个问题的解决方案感兴趣，可以在这里找到：

https://support.bioconductor.org/p/64651/#64682

R 中的 rhdf5 包和数组 - 存储模式列表与双精度

rhdf5 package and arrays in R - storage mode list vs double

arrays

r

matrix

hdf5