循环导入和合并文件
Importing and merging files in loop
我必须合并数据集。它们是 .sav 文件,我每个月、每年有 6-7 个数据集——总共 13 年。有很多数据集需要导入和合并,我想使用循环自动执行此操作。
因为我是初学者,所以我编写了第一个循环来简单地合并一年的数据集(因此只循环几个月)。这是我的代码,它完美地完成了我想要的。它不是最快的,当然也不是最漂亮或最高效的,但它确实有效。注意:为了简洁起见,我在发布的代码中缩短了 "C..." 路径:在我的真实代码中,它是完整路径。
for (m in months) {
setwd(paste("C:.... survey\DANE 2005\",m,sep=""))
files_2005 <- list.files(path=(paste("C:\....survey\DANE 2005\",m,sep="")), pattern=("Area.*.sav"))
#for (i in (paste("files_",m,sep=""))){
df_2005 <- lapply(files_2005, read_sav)
assign(paste("DANE2005_",m,sep=""), df_2005 %>% reduce(rbind.fill))
#}
df_2005 <- mget(ls(pattern="DANE2005_"))
dane_2005 <- df_2005 %>% reduce(rbind.fill)
}
这是我当前的代码,循环了数年和数月(感谢@Onyambu 的评论)。但是,它仍然不起作用;如果我不使用 setwd R 表示 "current file does not exist in the directory"(并指回我的主目录,而不是指定的路径)。如果我确实使用 setwd,我会收到 "cannot change working directory" 错误。
for (y in years) {
for (m in months) {
#Go to a folder per year/month
path <- paste("C:.... survey\DANE ",y,"\",m,sep="")
#Create a list of all the files in that folder by month, based on a pattern
list_data<-list.files(path=path, pattern=("Area.*.sav"))
if(!is_empty(list_data)){
#Read in all the files in the folder by month, based on the list
df_2005 <- lapply(list_data, read_sav)
#bind the files for one month together based on the list
assign(paste("DANE2005_",m,sep=""), df_2005 %>% reduce(rbind.fill))
}
}
#Bind together all the files for one year
df_2005 <- mget(ls(pattern="DANE2005_"))
dane_2005 <- df_2005 %>% reduce(left_join)
}
非常感谢任何帮助。
编辑:清理代码并在初始评论后重新提出问题以清晰起见。
以下是您需要尝试的内容:
# give the path only to the folder where the years are inside the folder
path <- "C:.... survey"
# read all the files in this path using recursive = TRUE-Gives all years,all months,all files
all_files <- list.files(path, pattern = "Area.*.sav", full.names=TRUE, recursive = TRUE)
# Now read all these files into a list. Of course you would like to have the year and the month for the file:
my_read <- function(x){
nm <- unlist(strsplit(sub(".*survey/","",x),"/"))# Remove everything until survey. You only remain with year,month and file name
cbind(year = nm[1],month = nm[2],file = nm[3], read_sav(x))
}
# Now use myl_read function in read the data:
dat_list <- lapply(all_files,my_read)
我必须合并数据集。它们是 .sav 文件,我每个月、每年有 6-7 个数据集——总共 13 年。有很多数据集需要导入和合并,我想使用循环自动执行此操作。
因为我是初学者,所以我编写了第一个循环来简单地合并一年的数据集(因此只循环几个月)。这是我的代码,它完美地完成了我想要的。它不是最快的,当然也不是最漂亮或最高效的,但它确实有效。注意:为了简洁起见,我在发布的代码中缩短了 "C..." 路径:在我的真实代码中,它是完整路径。
for (m in months) {
setwd(paste("C:.... survey\DANE 2005\",m,sep=""))
files_2005 <- list.files(path=(paste("C:\....survey\DANE 2005\",m,sep="")), pattern=("Area.*.sav"))
#for (i in (paste("files_",m,sep=""))){
df_2005 <- lapply(files_2005, read_sav)
assign(paste("DANE2005_",m,sep=""), df_2005 %>% reduce(rbind.fill))
#}
df_2005 <- mget(ls(pattern="DANE2005_"))
dane_2005 <- df_2005 %>% reduce(rbind.fill)
}
这是我当前的代码,循环了数年和数月(感谢@Onyambu 的评论)。但是,它仍然不起作用;如果我不使用 setwd R 表示 "current file does not exist in the directory"(并指回我的主目录,而不是指定的路径)。如果我确实使用 setwd,我会收到 "cannot change working directory" 错误。
for (y in years) {
for (m in months) {
#Go to a folder per year/month
path <- paste("C:.... survey\DANE ",y,"\",m,sep="")
#Create a list of all the files in that folder by month, based on a pattern
list_data<-list.files(path=path, pattern=("Area.*.sav"))
if(!is_empty(list_data)){
#Read in all the files in the folder by month, based on the list
df_2005 <- lapply(list_data, read_sav)
#bind the files for one month together based on the list
assign(paste("DANE2005_",m,sep=""), df_2005 %>% reduce(rbind.fill))
}
}
#Bind together all the files for one year
df_2005 <- mget(ls(pattern="DANE2005_"))
dane_2005 <- df_2005 %>% reduce(left_join)
}
非常感谢任何帮助。
编辑:清理代码并在初始评论后重新提出问题以清晰起见。
以下是您需要尝试的内容:
# give the path only to the folder where the years are inside the folder
path <- "C:.... survey"
# read all the files in this path using recursive = TRUE-Gives all years,all months,all files
all_files <- list.files(path, pattern = "Area.*.sav", full.names=TRUE, recursive = TRUE)
# Now read all these files into a list. Of course you would like to have the year and the month for the file:
my_read <- function(x){
nm <- unlist(strsplit(sub(".*survey/","",x),"/"))# Remove everything until survey. You only remain with year,month and file name
cbind(year = nm[1],month = nm[2],file = nm[3], read_sav(x))
}
# Now use myl_read function in read the data:
dat_list <- lapply(all_files,my_read)