将列表 netcdf 合并到 R 中的一个数据帧中的最佳方法 - 嵌套 for 循环或 mapply?

Best way to combine lists netcdf into one dataframe in R - Nested for loops or mapply?

我正在尝试将多个 netcdf 文件与多个变量组合:

- 6 types of parameters
-36 years
- 12 months
-31 days
- 6 Y coordinates
- 5 X coordinates

每个netcdf文件包含一年1个月的数据和1个参数,因此有432 * 6 =2592个文件。

我如何最好地将所有这些组合到一个数据框中?它最终必须生成如下内容:

rowID   Date        year  month day coord.X coord.Y par1 par2  par3  par4 par5  par6       
1       1979-01-01  1979  01    01  176     428     3.2  0.005 233.5 0.1  12.2  4.4
..................... 402568 rows in between.................
402570  2014-12-31  2014  12    31  180     433     1.7  0.006 235.7 0.2  0.0   2.7

我如何最好地结合它?我已经为此苦苦挣扎了一段时间...

请原谅我不知道如何使这个问题可以重现..但是涉及的因素太多了。 这是我的文件来源: ftp://rfdata:forceDATA@ftp.iiasa.ac.at/WFDEI/

这就是我目前所拥有的,我认为这就是他们所说的嵌套循环吧?: 我通常只是尝试并尝试并最终成功......但我发现这是一项艰巨的工作。欢迎就第一步提出任何建议。

require(ncdf4)
directory<-c("C:/folder/")                              # general folder
parameter<-c("par1","par2","par3","par4","par5","par6") # names of 6 parameters
directory2<-c("_folder2/")                              # parameter specific folder
directory3<-c("name")                                   # last part of folder name
years<-c("1979","otheryears","2014")                    # years which are also part of netcdf file name
months<-c("01","othermonths","12")                      # months which are also part of netcdf file name
x=c(176:180)                                            # X-coordinates
y=c(428:433)                                            # Y-coordinates



 require(plyr)

 for (p in parameter){
assign(paste0(p,"list"), list())
  for (i in years){
   for (j in months){
    for (k in x){
      for (l in y){
fileloc<-paste(directory,p,directory2,p,directory3,i,j,".nc",sep="") #location to open
    ncin<-nc_open(fileloc)
assign(paste0(p))<-ncvar_get(ncin,p)                         # extract the desired parameter from the netcdf list "ncin" and store in vector with name of parameter
day<-ncvar_get(ncin,"day")                                   # extract the day of month from the netcdf list "ncin"
par.coord<-paste(p,"[",y,",",x,",","]",sep="")               #string with function to select coordinates
temp<-data.frame(i,j,day,p=par.coord)                        # store day and parameter in dataframe
temp<-cbind(date=as.Date(with(temp,paste(i,j,day,sep="-")),"%Y-%m-%d"),temp,Y=y,X=x)                                               # Add date and coordinates to df
assign(paste0(p,"list"), list(temp)                          #store multiple data frames in a list.. I think?
    }assign(paste0(p,"list"), do.call(rbind,data)            # something to bind the dataframes by row in a list
}}}}

许多 种方法可以像这样给猫剥皮。如果您是 R 的新手,嵌套循环可能更容易调试。我认为您想问自己的一个问题是文件是否具有首要地位,或者您的概念结构是否具有首要地位。也就是说,如果您的概念结构指定了一个没有文件的位置,您希望您的代码做什么?如果您只想尝试解析现有文件,我发现使用 list.files(, full.names = TRUE, recursive = TRUE) 查找我想要解析的文件然后编写一个函数来解析单个文件(及其名称)以生成数据很有用我想要的结构。从那里开始,它是 lapplypurrr::map

为了通过将所有 Netcdf 文件提取并分组到一个数据帧中来提取这些 Netcdf 文件:

-6 parameters
-36 years
-12 months
-31 days
-6 Y coordinates
-5 X coordinates

首先,我确保所有 *.nc 文件都在一个文件夹中。 其次,我将多个 for 循环简化为一个,因为年、月和参数变量可从文件名中获得:

变量day、Xcoord和Y coord可以提取为一个数组。

require(arrayhelpers);require(stringr);require(plyr);require(ncdf4)
# store all files from ftp://rfdata:forceDATA@ftp.iiasa.ac.at/WFDEI/ in the following folder:
setwd("C:/folder")
temp = list.files(pattern="*.nc")           #list all the file names
param<-gsub("_\S+","",temp,perl=T)         #extract parameter from file name

xcoord=seq(176,180,by=1)                    #The X-coordinates you are interested in
ycoord=seq(428,433,by=1)                    #The Y-coordinates you are interested in

list_var<-list()                         # make an empty list
for (t in 1:length(temp)){
temp_year<-str_sub(temp[],-9,-6)                                                                                #take string number last place minus 9 till last place minus 6 to extract the year from file name
temp_month<-str_sub(temp[],-5,-4)                                                                               #take string number last place minus 9 till last place minus 6 to extract the month from file name
temp_netcdf<-nc_open(temp[t]) 
temp_day<-rep(seq(1:length(ncvar_get(temp_netcdf,"day"))),length(xcoord)*length(ycoord))                   # make a string of day numbers the same length as amount of values
dim.order<-sapply(temp_netcdf[["var"]][[param[t]]][["dim"]],function(x) x$name)                            # gives the name of each level of the array
start <- c(lon = 428, lat = 176, tstep = 1)                                                                     # indicates the starting value of each variable
count <- c(lon = 6, lat = 5, tstep = length(ncvar_get(nc_open(temp[t]),"day")))                                 # indicates how many values of each variable have to be present starting from start
tempstore<-ncvar_get(temp_netcdf, param[t], start = start[dim.order], count = count[dim.order])            # array with parameter values

df_temp<-array2df (tempstore, levels = list(lon=ycoord, lat = xcoord, day = NA), label.x = "value")           # convert array to dataframe
Add_date<-sort(as.Date(paste(temp_year[t],"-",temp_month[t],"-",temp_day,sep=""),"%Y-%m-%d"),decreasing=FALSE)  # make vector with the dates
list_var[t]<-list(data.frame(Add_date,df_temp,parameter=param[t]))                                         #add dates to data frame and store in a list of all output files
  nc_close(temp_netcdf)                                                                                           #close nc file to prevent data loss and prevent error when working with a lot of files
}
All_NetCDF_var_in1df<-do.call(rbind,list_var)  

#### If you want to take a look at the netcdf files first use:
list2env(
  lapply(setNames(temp, make.names(gsub("*.nc$", "", temp))), 
         nc_open), envir = .GlobalEnv) #import all parameters lists to global environment