读取 GRIB2 文件时元数据丢失
Metadata is lost when reading a GRIB2 file
我想将 GRIB2 文件读入 R,但无法安装 wgrib2
(经过几个小时的努力),这意味着 rNOMADS
不是一个选项。没关系,因为 raster
和 rgdal
包都可以读取 GRIB2 文件。我 运行 遇到的问题是在读取文件时层的名称被删除。
这是一个例子。
# Load libraries
library(raster)
library(rgdal)
# Name of file
file_name <- "https://dd.weather.gc.ca/model_gem_regional/coupled/gulf_st-lawrence/grib2/00/001/CMC_coupled-rdps-stlawrence-ocean_latlon0.02x0.03_2020120100_P001.grib2"
# Load as raster brick
b <- brick(file_name)
# Get layer names
names(b)
# [1] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.1"
# [2] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.2"
# [3] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.3"
# [4] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.4"
# [5] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.5"
# [6] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.6"
# [7] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.7"
# [8] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.8"
# [9] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.9"
#[10] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.10"
如您所见,名称只是通用的默认值。接下来,我尝试了 rgdal.
# Load using rgdal
r <- readGDAL(file_name)
# Get names
names(r)
# [1] "band1" "band2" "band3" "band4" "band5" "band6" "band7" "band8"
# [9] "band9" "band10"
再次使用默认名称。但是,如果我使用命令行实用程序 ncl_convert2nc
将 GRIB2 文件转换为 NetCDF,然后使用 ncdf4
读取 NetCDF 文件——这是我不想在工作流程中包含的额外转换步骤如果可以避免的话——肯定存在变量名。
# [1] "UOGRD_P0_L160_GLL0" "VOGRD_P0_L160_GLL0" "ICEC_P0_L1_GLL0"
# [4] "ICETK_P0_L1_GLL0" "UICE_P0_L1_GLL0" "VICE_P0_L1_GLL0"
# [7] "ICETMP_P0_L1_GLL0" "ICEPRS_P0_L1_GLL0" "CICES_P0_L1_GLL0"
#[10] "WTMP_P0_L1_GLL0"
问题: 在使用 rgdal
或 raster
读取 GRIB2 文件时,有没有办法提取或保留 variable/layer 名称?
PS 我需要从文件中获取变量名称的原因是层 不 与指定的层顺序匹配 on the website 当加载(例如)raster
时。从变量值中可以明显看出这一点。虽然我可以使用从上面显示的 NetCDF 文件中收集的变量名称,但如果层的顺序发生变化,这将破坏我的包。
您可以使用 terra
包代替 raster
。
file_name <- "https://dd.weather.gc.ca/model_gem_regional/coupled/gulf_st-lawrence/grib2/00/001/CMC_coupled-rdps-stlawrence-ocean_latlon0.02x0.03_2020120100_P001.grib2"
b <- basename(file_name)
if (!file.exists(b)) download.file(file_name, b, mode="wb")
library(terra)
r <- rast(b)
r
#class : SpatRaster
#dimensions : 325, 500, 10 (nrow, ncol, nlyr)
#resolution : 0.03, 0.02 (x, y)
#extent : -71.015, -56.015, 45.49, 51.99 (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=longlat +R=6371229 +no_defs
#source : CMC_coupled-rdps-stlawrence-ocean_latlon0.02x0.03_2020120100_P001.grib2
#names : 0[-] SFC="Ground or water surface", 0[-] SFC="Ground or water surface", 0[-] SFC="Ground or water surface", 0[-] SFC="Ground or water surface", 0[m] DBSL="Depth below sea level", 0[m] DBSL="Depth below sea level", ...
但是变量名与你的不匹配
names(r)
# [1] "0[-] SFC=\"Ground or water surface\"" "0[-] SFC=\"Ground or water surface\"" "0[-] SFC=\"Ground or water surface\""
# [4] "0[-] SFC=\"Ground or water surface\"" "0[m] DBSL=\"Depth below sea level\"" "0[m] DBSL=\"Depth below sea level\""
# [7] "0[-] SFC=\"Ground or water surface\"" "0[-] SFC=\"Ground or water surface\"" "0[-] SFC=\"Ground or water surface\""
#[10] "0[-] SFC=\"Ground or water surface\""
您可以将名称设置为元数据的其他部分
nms <- trimws(grep("GRIB_ELEMENT=", desc(b), value=TRUE))
names(r) <- gsub("GRIB_ELEMENT=", "", nms)
r
#class : SpatRaster
#dimensions : 325, 500, 10 (nrow, ncol, nlyr)
#resolution : 0.03, 0.02 (x, y)
#extent : -71.015, -56.015, 45.49, 51.99 (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=longlat +R=6371229 +no_defs
#source : CMC_coupled-rdps-stlawrence-ocean_latlon0.02x0.03_2020120100_P001.grib2
#names : ICEC, ICETK, UICE, VICE, UOGRD, VOGRD, ..
names(r)
#[1] "ICEC" "ICETK" "UICE" "VICE" "UOGRD" "VOGRD" "WTMP" "ICET" "ICEPRS" "CICES"
我可以更改 terra
的行为,使其使用“GRIB_ELEMENT”(如果有意义请告诉我)。但是我不清楚如何获得您显示的名称。例如,下面是第一层的 GDAL 元数据。你显示 ICEC_P0_L1_GLL0
。所有的名字都有 P0
和 GLL0
所以至少对于这个文件来说,这些似乎是多余的。但是L1
指的是什么?
d <-desc(b)
d[35:46]
# [1] " GRIB_COMMENT=Ice cover [Proportion]"
# [2] " GRIB_DISCIPLINE=10(Oceanographic_Products)"
# [3] " GRIB_ELEMENT=ICEC"
# [4] " GRIB_FORECAST_SECONDS=3600 sec"
# [5] " GRIB_IDS=CENTER=54(Montreal) SUBCENTER=0 MASTER_TABLE=4 LOCAL_TABLE=0 SIGNF_REF_TIME=1(Start_of_Forecast) REF_TIME=2020-12-01T00:00:00Z PROD_STATUS=0(Operational) TYPE=1(Forecast)"
# [6] " GRIB_PDS_PDTN=0"
# [7] " GRIB_PDS_TEMPLATE_ASSEMBLED_VALUES=2 0 2 56 56 0 0 1 1 1 0 0 255 -127 -2147483647"
# [8] " GRIB_PDS_TEMPLATE_NUMBERS=2 0 2 56 56 0 0 0 1 0 0 0 1 1 0 0 0 0 0 255 255 255 255 255 255"
# [9] " GRIB_REF_TIME= 1606780800 sec UTC"
#[10] " GRIB_SHORT_NAME=0-SFC"
#[11] " GRIB_UNIT=[Proportion]"
#[12] " GRIB_VALID_TIME= 1606784400 sec UTC"
我看到“UOGRD”和“VOGRD”有 L160
并且在“GRIB_PDS_TEMPLATE_ASSEMBLED_VALUE”和“GRIB_PDS_TEMPLATE_NUMBERS”中有数字 160,而其他有 1.
描述了元数据结构here,但我可以使用一些指导来了解从元数据中提取的内容。
我想将 GRIB2 文件读入 R,但无法安装 wgrib2
(经过几个小时的努力),这意味着 rNOMADS
不是一个选项。没关系,因为 raster
和 rgdal
包都可以读取 GRIB2 文件。我 运行 遇到的问题是在读取文件时层的名称被删除。
这是一个例子。
# Load libraries
library(raster)
library(rgdal)
# Name of file
file_name <- "https://dd.weather.gc.ca/model_gem_regional/coupled/gulf_st-lawrence/grib2/00/001/CMC_coupled-rdps-stlawrence-ocean_latlon0.02x0.03_2020120100_P001.grib2"
# Load as raster brick
b <- brick(file_name)
# Get layer names
names(b)
# [1] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.1"
# [2] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.2"
# [3] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.3"
# [4] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.4"
# [5] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.5"
# [6] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.6"
# [7] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.7"
# [8] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.8"
# [9] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.9"
#[10] "CMC_coupled.rdps.stlawrence.ocean_latlon0.02x0.03_2020120100_P001.10"
如您所见,名称只是通用的默认值。接下来,我尝试了 rgdal.
# Load using rgdal
r <- readGDAL(file_name)
# Get names
names(r)
# [1] "band1" "band2" "band3" "band4" "band5" "band6" "band7" "band8"
# [9] "band9" "band10"
再次使用默认名称。但是,如果我使用命令行实用程序 ncl_convert2nc
将 GRIB2 文件转换为 NetCDF,然后使用 ncdf4
读取 NetCDF 文件——这是我不想在工作流程中包含的额外转换步骤如果可以避免的话——肯定存在变量名。
# [1] "UOGRD_P0_L160_GLL0" "VOGRD_P0_L160_GLL0" "ICEC_P0_L1_GLL0"
# [4] "ICETK_P0_L1_GLL0" "UICE_P0_L1_GLL0" "VICE_P0_L1_GLL0"
# [7] "ICETMP_P0_L1_GLL0" "ICEPRS_P0_L1_GLL0" "CICES_P0_L1_GLL0"
#[10] "WTMP_P0_L1_GLL0"
问题: 在使用 rgdal
或 raster
读取 GRIB2 文件时,有没有办法提取或保留 variable/layer 名称?
PS 我需要从文件中获取变量名称的原因是层 不 与指定的层顺序匹配 on the website 当加载(例如)raster
时。从变量值中可以明显看出这一点。虽然我可以使用从上面显示的 NetCDF 文件中收集的变量名称,但如果层的顺序发生变化,这将破坏我的包。
您可以使用 terra
包代替 raster
。
file_name <- "https://dd.weather.gc.ca/model_gem_regional/coupled/gulf_st-lawrence/grib2/00/001/CMC_coupled-rdps-stlawrence-ocean_latlon0.02x0.03_2020120100_P001.grib2"
b <- basename(file_name)
if (!file.exists(b)) download.file(file_name, b, mode="wb")
library(terra)
r <- rast(b)
r
#class : SpatRaster
#dimensions : 325, 500, 10 (nrow, ncol, nlyr)
#resolution : 0.03, 0.02 (x, y)
#extent : -71.015, -56.015, 45.49, 51.99 (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=longlat +R=6371229 +no_defs
#source : CMC_coupled-rdps-stlawrence-ocean_latlon0.02x0.03_2020120100_P001.grib2
#names : 0[-] SFC="Ground or water surface", 0[-] SFC="Ground or water surface", 0[-] SFC="Ground or water surface", 0[-] SFC="Ground or water surface", 0[m] DBSL="Depth below sea level", 0[m] DBSL="Depth below sea level", ...
但是变量名与你的不匹配
names(r)
# [1] "0[-] SFC=\"Ground or water surface\"" "0[-] SFC=\"Ground or water surface\"" "0[-] SFC=\"Ground or water surface\""
# [4] "0[-] SFC=\"Ground or water surface\"" "0[m] DBSL=\"Depth below sea level\"" "0[m] DBSL=\"Depth below sea level\""
# [7] "0[-] SFC=\"Ground or water surface\"" "0[-] SFC=\"Ground or water surface\"" "0[-] SFC=\"Ground or water surface\""
#[10] "0[-] SFC=\"Ground or water surface\""
您可以将名称设置为元数据的其他部分
nms <- trimws(grep("GRIB_ELEMENT=", desc(b), value=TRUE))
names(r) <- gsub("GRIB_ELEMENT=", "", nms)
r
#class : SpatRaster
#dimensions : 325, 500, 10 (nrow, ncol, nlyr)
#resolution : 0.03, 0.02 (x, y)
#extent : -71.015, -56.015, 45.49, 51.99 (xmin, xmax, ymin, ymax)
#coord. ref. : +proj=longlat +R=6371229 +no_defs
#source : CMC_coupled-rdps-stlawrence-ocean_latlon0.02x0.03_2020120100_P001.grib2
#names : ICEC, ICETK, UICE, VICE, UOGRD, VOGRD, ..
names(r)
#[1] "ICEC" "ICETK" "UICE" "VICE" "UOGRD" "VOGRD" "WTMP" "ICET" "ICEPRS" "CICES"
我可以更改 terra
的行为,使其使用“GRIB_ELEMENT”(如果有意义请告诉我)。但是我不清楚如何获得您显示的名称。例如,下面是第一层的 GDAL 元数据。你显示 ICEC_P0_L1_GLL0
。所有的名字都有 P0
和 GLL0
所以至少对于这个文件来说,这些似乎是多余的。但是L1
指的是什么?
d <-desc(b)
d[35:46]
# [1] " GRIB_COMMENT=Ice cover [Proportion]"
# [2] " GRIB_DISCIPLINE=10(Oceanographic_Products)"
# [3] " GRIB_ELEMENT=ICEC"
# [4] " GRIB_FORECAST_SECONDS=3600 sec"
# [5] " GRIB_IDS=CENTER=54(Montreal) SUBCENTER=0 MASTER_TABLE=4 LOCAL_TABLE=0 SIGNF_REF_TIME=1(Start_of_Forecast) REF_TIME=2020-12-01T00:00:00Z PROD_STATUS=0(Operational) TYPE=1(Forecast)"
# [6] " GRIB_PDS_PDTN=0"
# [7] " GRIB_PDS_TEMPLATE_ASSEMBLED_VALUES=2 0 2 56 56 0 0 1 1 1 0 0 255 -127 -2147483647"
# [8] " GRIB_PDS_TEMPLATE_NUMBERS=2 0 2 56 56 0 0 0 1 0 0 0 1 1 0 0 0 0 0 255 255 255 255 255 255"
# [9] " GRIB_REF_TIME= 1606780800 sec UTC"
#[10] " GRIB_SHORT_NAME=0-SFC"
#[11] " GRIB_UNIT=[Proportion]"
#[12] " GRIB_VALID_TIME= 1606784400 sec UTC"
我看到“UOGRD”和“VOGRD”有 L160
并且在“GRIB_PDS_TEMPLATE_ASSEMBLED_VALUE”和“GRIB_PDS_TEMPLATE_NUMBERS”中有数字 160,而其他有 1.
描述了元数据结构here,但我可以使用一些指导来了解从元数据中提取的内容。