修改后如何保留因子的变量标签?
How would one keep the variable labels of factors after modifications?
假设您对因子进行了修改后,如何最好地保留变量标签。我注意到当您对变量组进行轻微修改时,变量标签会被删除。下面是删除变量标签的代码示例:
library(tidyverse) # forcats and dplyr
set.seed(2021) # Reproducibility
mydata <- tibble(
a1 = factor(round(runif(20, 1, 3)),
labels = c("Yes", "No", "N/A")),
a2 = factor(round(runif(20, 1, 3)),
labels = c("Received", "Not Received", "N/A")),
a3 = round(rnorm(20, 2, 1)))
attr(mydata$a1, "label") <- "Exposed"
attr(mydata$a2, "label") <- "Receipt of treatment"
attr(mydata$a3, "label") <- "Dosage"
str(mydata) # There are variable labels as assigned
mydata <- mydata %>%
mutate(across(where(is.factor), ~fct_collapse(., NULL = "N/A")))
str(mydata) # Variables labels for factors are dropped
Advanced R 一书中提到属性通常应该被认为是短暂的。这意味着大多数操作不会保留您设置的属性。这包括您在示例中设置的变量标签。
如果您确实需要将标签保留为属性,可以使用 S3 classes 来实现。但是,此解决方案非常复杂,因为您需要为应用于标记对象的每个函数编写通用函数。
对于所提供的示例,这看起来类似于以下内容。首先,我们定义一个 class 构造函数并将 class 应用于数据集中的列。
library(dplyr)
library(forcats)
new_labelled <- function(x, label){
stopifnot(is.character(label))
structure(x, class = c("labelled", attr(x, "class", TRUE)), label = label)
}
set.seed(2021) # Reproducibility
mydata <- tibble(
a1 = factor(round(runif(20, 1, 3)),
labels = c("Yes", "No", "N/A")),
a2 = factor(round(runif(20, 1, 3)),
labels = c("Received", "Not Received", "N/A")),
a3 = round(rnorm(20, 2, 1))) %>%
mutate(
a1 = new_labelled(a1, "Exposed"),
a2 = new_labelled(a2, "Receipt of treatment"),
a3 = new_labelled(a3, "Dosage"))
str(mydata) # Variable labels are applied
接下来我们需要为 fct_collapse
实现泛型:
fct_collapse2 <- function(.f, ..., other_level=NULL){
UseMethod("fct_collapse2")
}
fct_collapse2.labelled <- function(.f, ..., other_level=NULL){
stopifnot(is.factor(.f))
label <- attr(.f, "label", TRUE)
new_labelled(NextMethod(), label)
}
fct_collapse2.factor <- function(.f, ..., other_level=NULL){
fct_collapse(.f, ..., other_level)
}
这允许我们保留标签:
mydata <- mydata %>%
mutate(across(where(is.factor), ~fct_collapse2(., NULL = "N/A")))
str(mydata) # The labels are preserved
在大多数情况下,将标签存储在某个地方并在完成所有数据操作后添加它们可能比为应用于标记对象的每个函数实现泛型更容易。
假设您对因子进行了修改后,如何最好地保留变量标签。我注意到当您对变量组进行轻微修改时,变量标签会被删除。下面是删除变量标签的代码示例:
library(tidyverse) # forcats and dplyr
set.seed(2021) # Reproducibility
mydata <- tibble(
a1 = factor(round(runif(20, 1, 3)),
labels = c("Yes", "No", "N/A")),
a2 = factor(round(runif(20, 1, 3)),
labels = c("Received", "Not Received", "N/A")),
a3 = round(rnorm(20, 2, 1)))
attr(mydata$a1, "label") <- "Exposed"
attr(mydata$a2, "label") <- "Receipt of treatment"
attr(mydata$a3, "label") <- "Dosage"
str(mydata) # There are variable labels as assigned
mydata <- mydata %>%
mutate(across(where(is.factor), ~fct_collapse(., NULL = "N/A")))
str(mydata) # Variables labels for factors are dropped
Advanced R 一书中提到属性通常应该被认为是短暂的。这意味着大多数操作不会保留您设置的属性。这包括您在示例中设置的变量标签。
如果您确实需要将标签保留为属性,可以使用 S3 classes 来实现。但是,此解决方案非常复杂,因为您需要为应用于标记对象的每个函数编写通用函数。
对于所提供的示例,这看起来类似于以下内容。首先,我们定义一个 class 构造函数并将 class 应用于数据集中的列。
library(dplyr)
library(forcats)
new_labelled <- function(x, label){
stopifnot(is.character(label))
structure(x, class = c("labelled", attr(x, "class", TRUE)), label = label)
}
set.seed(2021) # Reproducibility
mydata <- tibble(
a1 = factor(round(runif(20, 1, 3)),
labels = c("Yes", "No", "N/A")),
a2 = factor(round(runif(20, 1, 3)),
labels = c("Received", "Not Received", "N/A")),
a3 = round(rnorm(20, 2, 1))) %>%
mutate(
a1 = new_labelled(a1, "Exposed"),
a2 = new_labelled(a2, "Receipt of treatment"),
a3 = new_labelled(a3, "Dosage"))
str(mydata) # Variable labels are applied
接下来我们需要为 fct_collapse
实现泛型:
fct_collapse2 <- function(.f, ..., other_level=NULL){
UseMethod("fct_collapse2")
}
fct_collapse2.labelled <- function(.f, ..., other_level=NULL){
stopifnot(is.factor(.f))
label <- attr(.f, "label", TRUE)
new_labelled(NextMethod(), label)
}
fct_collapse2.factor <- function(.f, ..., other_level=NULL){
fct_collapse(.f, ..., other_level)
}
这允许我们保留标签:
mydata <- mydata %>%
mutate(across(where(is.factor), ~fct_collapse2(., NULL = "N/A")))
str(mydata) # The labels are preserved
在大多数情况下,将标签存储在某个地方并在完成所有数据操作后添加它们可能比为应用于标记对象的每个函数实现泛型更容易。