如何在具有不同因子水平的不同数据帧的函数中使用 Forcats::Fct_Collapse
How to Use Forcats::Fct_Collapse in a Function Across Different Dataframes with Different Factor Levels
library(tidyverse)
library(forcats)
我有两个简单的数据框(底部的代码),我想通过折叠 "Animal" 列来创建一个新的重新编码的变量。我通常使用 forcats::fct_collapse 来执行此操作。但是,我想创建一个函数,将 fct_collapse 应用于许多具有相同变量的不同数据帧,除了一些可能缺少一个或两个因子水平。例如,在这种情况下,缺少 Df2 "Rhino"。
有什么方法可以更改代码(使用 tiyverse),以便将缺少的因子类别作为 NA 返回?在此示例中,我知道它是 "Rhino",但在我的真实数据中,可能还有其他缺失级别。除了 forcats::fct_collapse,我对其他选择持开放态度,但我想留在 tidyverse 的范围内。
REC <- function(Df, Data){
Df %>%
mutate(NEW = fct_collapse(Data, One = c("Cat","Dog","Snake"),
Two = c("Elephant","Bird","Rhino")))
}
REC(Df1,Animal) - this works
REC(DF2,Animal) - this doesn't, it throws an error because of "Rhino"
示例数据:
Animal <- c("Cat","Dog","Snake","Elephant","Bird","Rhino")
Code <- c(101,222,434,545,444,665)
Animal2 <- c("Cat","Dog","Snake","Elephant","Bird")
Code2 <- c(101,222,434,545,444)
Df1 <- data_frame(Code, Animal)
Df2 <- data_frame(Code2, Animal2) %> %rename(Animal = Animal2)
这是给你的一个想法。我最初试图在我的函数中有两个参数。一个用于数据框,另一个用于包含动物名称的列。但这次尝试失败了。我收到一条错误消息,说 "Error in mutate_impl(.data, dots) : Column new
must be length 5 (the number of rows) or one, not 6." 所以我决定不在函数中包含列名;我在我的函数中明确地说 Animal
。然后,事情奏效了。这个想法是创建一个缺少动物名称的因子变量。这是在 factor()
和 setdiff()
中完成的。一旦我有了所有动物的名字,我就使用 fct_collapse()
.
myfun <- function(mydf){
animals <- c("Cat", "Dog", "Snake", "Elephant", "Bird", "Rhino")
mydf %>%
mutate(new = factor(Animal, levels = c(unique(Animal), setdiff(animals, Animal))),
new = fct_collapse(new, One = c("Cat", "Dog", "Snake"),
Two = c("Elephant", "Bird", "Rhino"))) -> x
x}
> myfun(Df2)
# A tibble: 5 x 3
Code2 Animal new
<dbl> <chr> <fct>
1 101 Cat One
2 222 Dog One
3 434 Snake One
4 545 Elephant Two
5 444 Bird Two
> myfun(Df1)
# A tibble: 6 x 3
Code Animal new
<dbl> <chr> <fct>
1 101 Cat One
2 222 Dog One
3 434 Snake One
4 545 Elephant Two
5 444 Bird Two
6 665 Rhino Two
备注:
除了我有两个参数之外,以下函数是相同的。这是行不通的。如果可以修改,请告诉我。
myfun2 <- function(mydf, mycol){
animals <- c("Cat", "Dog", "Snake", "Elephant", "Bird", "Rhino")
mydf %>%
mutate(new = factor(mycol, levels = c(unique(mycol), setdiff(animals, mycol))),
new = fct_collapse(new, One = c("Cat", "Dog", "Snake"),
Two = c("Elephant", "Bird", "Rhino"))) -> x
x}
> myfun2(Df2, Animal)
Error in mutate_impl(.data, dots) :
Column `new` must be length 5 (the number of rows) or one, not 6
library(tidyverse)
library(forcats)
我有两个简单的数据框(底部的代码),我想通过折叠 "Animal" 列来创建一个新的重新编码的变量。我通常使用 forcats::fct_collapse 来执行此操作。但是,我想创建一个函数,将 fct_collapse 应用于许多具有相同变量的不同数据帧,除了一些可能缺少一个或两个因子水平。例如,在这种情况下,缺少 Df2 "Rhino"。
有什么方法可以更改代码(使用 tiyverse),以便将缺少的因子类别作为 NA 返回?在此示例中,我知道它是 "Rhino",但在我的真实数据中,可能还有其他缺失级别。除了 forcats::fct_collapse,我对其他选择持开放态度,但我想留在 tidyverse 的范围内。
REC <- function(Df, Data){
Df %>%
mutate(NEW = fct_collapse(Data, One = c("Cat","Dog","Snake"),
Two = c("Elephant","Bird","Rhino")))
}
REC(Df1,Animal) - this works
REC(DF2,Animal) - this doesn't, it throws an error because of "Rhino"
示例数据:
Animal <- c("Cat","Dog","Snake","Elephant","Bird","Rhino")
Code <- c(101,222,434,545,444,665)
Animal2 <- c("Cat","Dog","Snake","Elephant","Bird")
Code2 <- c(101,222,434,545,444)
Df1 <- data_frame(Code, Animal)
Df2 <- data_frame(Code2, Animal2) %> %rename(Animal = Animal2)
这是给你的一个想法。我最初试图在我的函数中有两个参数。一个用于数据框,另一个用于包含动物名称的列。但这次尝试失败了。我收到一条错误消息,说 "Error in mutate_impl(.data, dots) : Column new
must be length 5 (the number of rows) or one, not 6." 所以我决定不在函数中包含列名;我在我的函数中明确地说 Animal
。然后,事情奏效了。这个想法是创建一个缺少动物名称的因子变量。这是在 factor()
和 setdiff()
中完成的。一旦我有了所有动物的名字,我就使用 fct_collapse()
.
myfun <- function(mydf){
animals <- c("Cat", "Dog", "Snake", "Elephant", "Bird", "Rhino")
mydf %>%
mutate(new = factor(Animal, levels = c(unique(Animal), setdiff(animals, Animal))),
new = fct_collapse(new, One = c("Cat", "Dog", "Snake"),
Two = c("Elephant", "Bird", "Rhino"))) -> x
x}
> myfun(Df2)
# A tibble: 5 x 3
Code2 Animal new
<dbl> <chr> <fct>
1 101 Cat One
2 222 Dog One
3 434 Snake One
4 545 Elephant Two
5 444 Bird Two
> myfun(Df1)
# A tibble: 6 x 3
Code Animal new
<dbl> <chr> <fct>
1 101 Cat One
2 222 Dog One
3 434 Snake One
4 545 Elephant Two
5 444 Bird Two
6 665 Rhino Two
备注: 除了我有两个参数之外,以下函数是相同的。这是行不通的。如果可以修改,请告诉我。
myfun2 <- function(mydf, mycol){
animals <- c("Cat", "Dog", "Snake", "Elephant", "Bird", "Rhino")
mydf %>%
mutate(new = factor(mycol, levels = c(unique(mycol), setdiff(animals, mycol))),
new = fct_collapse(new, One = c("Cat", "Dog", "Snake"),
Two = c("Elephant", "Bird", "Rhino"))) -> x
x}
> myfun2(Df2, Animal)
Error in mutate_impl(.data, dots) :
Column `new` must be length 5 (the number of rows) or one, not 6