R:向数据帧添加一个新变量,其值等于数据帧的名称

R: Add a new variable to dataframes whose value is equal to the name of the dataframes

我想向我的全局环境中的所有数据帧添加一个变量,并使新添加的列的值等于数据帧名称。

Product=c("A","A","A","A","A","A","A","A","A","A","A","A","B","B","B","C","C","C")
Day=c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Monday","Tuesday","Wednesday","Saturday","Sunday" ,"Monday")

data1=data.frame(Product, Day)

Product2=c("Z","Z","Z","Z","Z","Z","Z","Z","Z","Z","Z","Z","Y","Y","Y","X","X","X")
Day2=c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Monday","Tuesday","Wednesday","Saturday","Sunday" ,"Monday")

data2=data.frame(Product2, Day2)

我想在两个数据框中添加一列,其值等于数据框名称,即 newvar="data1" 用于 data1,newvar="data2" 用于 data2。我的实际数据框列表比这个长得多。

非常感谢任何帮助。

谢谢!

如果 'data.frame' 对象名称是 'data' 后跟数字,我们可以使用 paste 将对象名称作为字符串获取(如果我们已经知道对象名称)

  nm1 <- paste0('data', 1:2)

或者如果全局环境中有 100 个对象名称并且我们不知道存在多少对象,则另一种选择是将 ls 与模式参数一起使用。

  nm1 <- ls(pattern='^data\d+')

使用 mget 获取 list 中的值,并通过 cbind 使用 Map 创建一个新列 ('newvar')。使用 Map 确保 list 中的每个数据集都添加了一个与对象名称对应的新列。

  lst <- Map(cbind, mget(nm1), newvar= nm1)

最好将其保存在 list 中,因为它可以在其中执行所有操作。但是,如果原始对象需要在全局环境中更新,list2env 是一个选项(虽然不推荐)

  list2env(lst, envir=.GlobalEnv)

我还可以直接读取 list 中的所有文件 (.csv/.txt) 而不是创建单个对象。比如我们可以通过

读取工作目录下的所有文件
   files <- list.files()
   lst <- lapply(files, read.csv, stringsAsFactors=FALSE)

参数可能需要根据分隔符进行一些更改。

这是一个函数,您可以在其中传递任意数量的命名 data.frames,它会 return 一个命名为 data.frames 的列表返回并添加请求的列。使用 list2env 函数(如@akrun 的回答),您可以将它们放在任何您想要的环境中。 (您也可以修改该函数以自动产生该副作用。)

f <- function(...) {
    objnames <- as.character(substitute(c(...)))[-1]
    obj <- list(...)
    out <- mapply(function(x, col) {
        x[, col] <- col
        x
    }, obj, objnames, SIMPLIFY = FALSE)
    setNames(out, objnames)
}

使用方法如下:

list2env(f(data1,data2), .GlobalEnv)
# <environment: R_GlobalEnv>
str(data1)
# 'data.frame':   18 obs. of  3 variables:
#  $ Product: Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 1 ...
#  $ Day    : Factor w/ 7 levels "Friday","Monday",..: 2 6 7 5 1 3 2 6 7 5 ...
#  $ data1  : chr  "data1" "data1" "data1" "data1" ...
str(data2)
# 'data.frame':   18 obs. of  3 variables:
#  $ Product2: Factor w/ 3 levels "X","Y","Z": 3 3 3 3 3 3 3 3 3 3 ...
#  $ Day2    : Factor w/ 7 levels "Friday","Monday",..: 2 6 7 5 1 3 2 6 7 5 ...
#  $ data2   : chr  "data2" "data2" "data2" "data2" ...

如果您想传递大量命名对象而不在 f() 中明确列出它们,您可以这样做:

list2env(do.call(f, sapply(ls(pattern = "data"), as.name)), .GlobalEnv)

结果相同。