无法将新数据合并到列表的每个数据框元素
Trouble merging new data to each data frame element of a list
我在使用 for 循环将新数据附加到列表的每个数据框元素时遇到困难。
如果我有一个包含两个数据框(文件列表)的列表,并且我希望 "dplyr::left_join" 或 "merge" 列表中的每个数据框与来自单个数据框的其他数据,它不会之后似乎出现在列表中。但是,如果我对列表的每个数据框元素逐步和单独使用相同的命令,我会收到相同的警告(由于缺少因子级别),但会得到预期的结果。例如:
一些数据帧
df1 <- data.frame(x = 1:3, y=letters[1:3])
df2 <- data.frame(x = 1:5, y=letters[1:5])
# make list of dataframes
filelist <- list(df1,df2)
# new data frame to add to the data frames in the list by indexing "y"
df3 <- data.frame(animal = c(rep("snake", 7)), y=letters[1:7], geno = c("aa", "ab", "ac", "aa", "ac", "ab", "ae"))
# merge df3 into both data frames in the filelist
for (i in 1:length(filelist)) {dplyr::left_join(filelist[[i]], df3, by = "y")}
## Gives the following warning because some factor levels are missing between datasets
Warning message:
Column `y` joining factors with different levels, coercing to character vector
返回结果与原文件列表相同
> filelist
[[1]]
x y
1 1 a
2 2 b
3 3 c
[[2]]
x y
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
预期结果(通过分别合并列表的每个元素,然后创建一个新列表来完成)
new1 <- dplyr::left_join(filelist[[1]], df3, by = "y")
new2 <- dplyr::left_join(filelist[[2]], df3, by = "y")
newlist <-(new1,new2)
> newlist
[[1]]
x y animal geno
1 1 a snake aa
2 2 b snake ab
3 3 c snake ac
[[2]]
x y animal geno
1 1 a snake aa
2 2 b snake ab
3 3 c snake ac
4 4 d snake aa
5 5 e snake ac
在不从原始列表中取出每个数据框、添加新数据然后创建新列表的情况下,执行此操作的最佳方法是什么?
我会使用 purrr
包中的 map
函数,它就像 dplyr
是 tidyverse 的一部分:
library(tidyverse)
library(purrr) # loaded when you call tidyverse, but doing it explicitly here
map(filelist, left_join, df3)
[[1]]
x y animal geno
1 1 a snake aa
2 2 b snake ab
3 3 c snake ac
[[2]]
x y animal geno
1 1 a snake aa
2 2 b snake ab
3 3 c snake ac
4 4 d snake aa
5 5 e snake ac
Warning messages:
1: Column `y` joining factors with different levels, coercing to character vector
2: Column `y` joining factors with different levels, coercing to character vector
正如警告信息中所说,因素有不同的水平。
您可以将每个数据帧的因子转换为字符,如下所示 dplyr
:
df %>% mutate_if(is.factor, as.character) -> df
或均化变量 y 的因子水平:
for (i in 1:length(filelist)) {
x = factor(unique(c(levels(filelist[[i]]$y),levels(df3$y))))
levels(filelist[[i]]$y) = x
levels(df3$y) = x
filelist[[i]] = dplyr::left_join(filelist[[i]], df3, by = "y")
}
我在使用 for 循环将新数据附加到列表的每个数据框元素时遇到困难。
如果我有一个包含两个数据框(文件列表)的列表,并且我希望 "dplyr::left_join" 或 "merge" 列表中的每个数据框与来自单个数据框的其他数据,它不会之后似乎出现在列表中。但是,如果我对列表的每个数据框元素逐步和单独使用相同的命令,我会收到相同的警告(由于缺少因子级别),但会得到预期的结果。例如:
一些数据帧
df1 <- data.frame(x = 1:3, y=letters[1:3])
df2 <- data.frame(x = 1:5, y=letters[1:5])
# make list of dataframes
filelist <- list(df1,df2)
# new data frame to add to the data frames in the list by indexing "y"
df3 <- data.frame(animal = c(rep("snake", 7)), y=letters[1:7], geno = c("aa", "ab", "ac", "aa", "ac", "ab", "ae"))
# merge df3 into both data frames in the filelist
for (i in 1:length(filelist)) {dplyr::left_join(filelist[[i]], df3, by = "y")}
## Gives the following warning because some factor levels are missing between datasets
Warning message:
Column `y` joining factors with different levels, coercing to character vector
返回结果与原文件列表相同
> filelist
[[1]]
x y
1 1 a
2 2 b
3 3 c
[[2]]
x y
1 1 a
2 2 b
3 3 c
4 4 d
5 5 e
预期结果(通过分别合并列表的每个元素,然后创建一个新列表来完成)
new1 <- dplyr::left_join(filelist[[1]], df3, by = "y")
new2 <- dplyr::left_join(filelist[[2]], df3, by = "y")
newlist <-(new1,new2)
> newlist
[[1]]
x y animal geno
1 1 a snake aa
2 2 b snake ab
3 3 c snake ac
[[2]]
x y animal geno
1 1 a snake aa
2 2 b snake ab
3 3 c snake ac
4 4 d snake aa
5 5 e snake ac
在不从原始列表中取出每个数据框、添加新数据然后创建新列表的情况下,执行此操作的最佳方法是什么?
我会使用 purrr
包中的 map
函数,它就像 dplyr
是 tidyverse 的一部分:
library(tidyverse)
library(purrr) # loaded when you call tidyverse, but doing it explicitly here
map(filelist, left_join, df3)
[[1]]
x y animal geno
1 1 a snake aa
2 2 b snake ab
3 3 c snake ac
[[2]]
x y animal geno
1 1 a snake aa
2 2 b snake ab
3 3 c snake ac
4 4 d snake aa
5 5 e snake ac
Warning messages:
1: Column `y` joining factors with different levels, coercing to character vector
2: Column `y` joining factors with different levels, coercing to character vector
正如警告信息中所说,因素有不同的水平。
您可以将每个数据帧的因子转换为字符,如下所示 dplyr
:
df %>% mutate_if(is.factor, as.character) -> df
或均化变量 y 的因子水平:
for (i in 1:length(filelist)) {
x = factor(unique(c(levels(filelist[[i]]$y),levels(df3$y))))
levels(filelist[[i]]$y) = x
levels(df3$y) = x
filelist[[i]] = dplyr::left_join(filelist[[i]], df3, by = "y")
}