重塑网格中的数据并使用数据框名称填充列

Question

主要问题：有没有办法用数据框名称填充 df，而不仅仅是手动输入并粘贴？

我在一个文件夹中有 20 个 csv 文件，这些文件的数据网格看起来有点像这样。文件 1：

	X1	X2	X3
Y1	1	2	3
Y2	4	5	6
Y3	7	8	9

文件 2：

	X1	X2	X3
Y1	1	4	7
Y2	2	5	8
Y3	3	6	9

注意：X1,2,3 和 Y1,2,3 是坐标，而填充的值只是示例值，并不表示任何模式

每个文件都有一个唯一的 ID，例如 US_plot_1.csv、US_plot_2.csv、UK_plot_1.csv、US_plot_2.csv、

我想填充一个 df，将这些文件分类到 R 可以分析的列中，按文件名分组，即

filename	X	Y	Values
US_plot_1	X1	Y1	1
US_plot_1	X1	Y2	4
US_plot_1	X1	Y3	7
US_plot_1	X2	Y1	2
US_plot_1	X2	Y2	5
US_plot_1	X2	Y3	8
US_plot_1	X3	Y1	3
US_plot_1	X3	Y2	6
US_plot_1	X3	Y3	9
US_plot_2	X1	Y1	1
US_plot_2	X1	Y2	2
US_plot_2	X1	Y3	3

我了解填充数据我可以循环它。

df<- lapply(Sys.glob("*.csv"), read.csv) #to load all the csvs
df<- as.data.frame(df)
df<-lapply(split(sequence(ncol(df)), rep(1:(4/4), each = 4)), function(x) df[, x])

filenames<- list.files(path=getwd())  #to get the filenames
filenames2<- substr(filenames, 1, 9)

for (i in 1:20) {
     assign(paste(filenames2[i], i), data.frame(df[[i]]))
}

然后当我需要将每个数据更改为 r-analysable df 时，我可以使用 tidyr 包和 gather():

US_plot_1<-DF %>% gather (X_coord, Value, X1:X3)

我知道我总是可以使用手动粘贴它

US_plot_1$filename<-paste("US_plot_1")

但我想知道是否有更有效的方法？

Answer 1

您可以在 lapply 命令中执行此操作：

library(dplyr)
library(tidyr)

filenames <- Sys.glob("*.csv")

df<- lapply(filenames, function(x) {
  #Read the csv
  read.csv(x) %>%
    #Get the data in long format, gather is retired using pivot_longer
    pivot_longer(cols = starts_with('X')) %>%
    #Adding a new column as filename
    mutate(filename = tools::file_path_sans_ext(x))
})

如果您希望它们作为单独的数据帧：

names(df) <- tools::file_path_sans_ext(filenames)
list2env(df, .GlobalEnv)

重塑网格中的数据并使用数据框名称填充列

Reshaping data from a grid and populating a column with the data frame name

csv

loops

r

dataframe

tidyr