如何使用 GDAL 检索 netcdf 中的所有变量名称

How to retrieve all variable names within a netcdf using GDAL

我正在努力寻找一种使用 GDAL 从文件中检索元数据信息的方法。 具体来说,我想检索波段名称和它们在给定文件中的存储顺序(可能是 GEOTIFF 或 NETCDF)。

例如,如果我们按照 GDAL 文档中的描述进行操作,我们会得到 gdal.Dataset 中的“GetMetaData”方法(参见 here and here). Despite this method returning a whole set of information regarding the dataset, it does not provide the band names and the order that they are stored within the given FILE. As a matter of fact, it seems to be an old problem (from 2015) that seems not to be solved yet (more info here). As it seems, "R" language has already solved this problem (see here),尽管 Python 还没有.

为了彻底说明,我知道还有其他 Python 包可以帮助完成这项工作(例如 xarray、rasterio 等);尽管如此,在单个脚本中应该使用的软件包集要简明扼要,这一点很重要。因此,我想知道使用 gdal.

找到乐队(a.k.a.,变量)名称以及它们存储在单个文件中的顺序的明确方法

请让我知道您在这方面的想法。

下面,我提出一个解决这个问题的起点,其中一个文件是通过 GDAL 打开的(创建一个数据集对象)。

from gdal import Dataset
from osgeo import gdal

OpeneddatasetFile = gdal.Open(f'NETCDF:{input}/{file_name}.nc:' + var)

if isinstance(OpeneddatasetFile , Dataset):
    print("File opened successfully")


# here is where one should be capable of fetching the variable (a.k.a., band) names
# of the OpeneddatasetFile.
# Ideally, it would be most welcome some kind of method that could return a dictionary 
# with this information

# something like:

# VariablesWithinFile = OpeneddatasetFile.getVariablesWithinFileAsDictionary()



我终于找到了一种使用 GDAL 从 NETCDF 文件中检索变量名的方法,这要感谢上面 Robert Davy 给出的评论。

我已将代码组织成一组函数以帮助其可视化。请注意,还有一个从 NETCDF 读取元数据的函数,returns 此信息以字典格式显示(请参阅“readInfo”函数)。

from gdal import Dataset, InfoOptions
from osgeo import gdal
import numpy as np


def read_data(filename):

    dataset = gdal.Open(filename)

    if not isinstance(dataset, Dataset):
        raise FileNotFoundError("Impossible to open the netcdf file")

    return dataset


def readInfo(ds, infoFormat="json"):
    "how to: https://gdal.org/python/"

    info = gdal.Info(ds, options=InfoOptions(format=infoFormat))

    return info


def listAllSubDataSets(infoDict: dict):

    subDatasetVariableKeys = [x for x in infoDict["metadata"]["SUBDATASETS"].keys()
                              if "_NAME" in x]

    subDatasetVariableNames = [infoDict["metadata"]["SUBDATASETS"][x]
                               for x in subDatasetVariableKeys]

    formatedsubDatasetVariableNames = []

    for x in subDatasetVariableNames:

        s = x.replace('"', '').split(":")[-1]
        s = ''.join(s)
        formatedsubDatasetVariableNames.append(s)

    return formatedsubDatasetVariableNames


if "__main__" == __name__:

    filename = "netcdfFile.nc"
    ds = read_data(filename)

    infoDict = readInfo(ds)

    infoDict["VariableNames"] = listAllSubDataSets(infoDict)