在 R 中重新编码重复变量的最新进展?

Recent development in recoding repeated variables in R?

所以我有一个很长的序列数据集。 每列(从 t1 到 t...n)都具有相同的级别或类别。 总共有 200 多个类别或级别和 144 个列(变量)。

 id    t1        t2        t3             t...n
"1"   "eating"  "tv"      "conversation" "..."
"2"   "sleep"   "driving" "relaxing"     "..."
"3"   "drawing" "kissing" "knitting"     "..."
"..." "..."     "..."     "..."          "..."

变量t1有相同的水平有t2等等。 我需要的是对每一列进行循环式重新编码(但要避免循环)。

我想避开通常的

seq$t1[seq$t1== "drawing"] <- 'leisure'
seq$t1[seq$t1== "eating"] <- 'meal'
seq$t1[seq$t1== "sleep"] <- 'personal care' 
seq$t1[seq$t1== "..."] <- ... 

最方便的重新编码样式类似于

c('leisure') = c('drawing', 'tv', ...) 

这将帮助我更好地将变量聚类到更大的类别中。

R 中最近出现了一些新的更简单的重新编码方法吗? 你会建议我使用什么?

这是我的真实数据集的样本,对 10 个受访者(按行)进行了 5 次重复观察(按列)。

dtaSeq = structure(c("Wash and dress", "Eating", "Various arrangements",     "Cleaning dwelling", "Ironing", "Activities related to sports", 
 "Eating", "Eating", "Other specified construction and repairs", 
"Other specified physical care & supervision of a child", "Wash and dress", 
"Filling in the time use diary", "Food preparation", "Wash and dress", 
"Ironing", "Travel related to physical exercise", "Eating", "Eating", 
"Other specified construction and repairs", "Other specified physical care & supervision of a child", 
"Wash and dress", "Filling in the time use diary", "Food preparation", 
"Wash and dress", "Food preparation", "Wash and dress", "Eating", 
"Eating", "Other specified construction and repairs", "Other specified     physical care & supervision of a child", 
"Wash and dress", "Filling in the time use diary", "Baking", 
"Teaching the child", "Food preparation", "Wash and dress", "Eating", 
"Eating", "Other specified construction and repairs", "Other specified physical care & supervision of a child", 
"Dish washing", "Unspecified TV watching", "Reading periodicals", 
"Teaching the child", "Food preparation", "Reading periodicals", 
"Eating", "Eating", "Other specified construction and repairs", 
"Feeding the child", "Laundry", "Unspecified TV watching", "Cleaning dwelling", 
"Teaching the child", "Eating", "Eating", "Eating", "Eating", 
"Other specified construction and repairs", "Feeding the child"), 
.Dim = c(10L, 6L), .Dimnames = list(c("1", "2", "3", "4", 
"5", "6", "7", "8", "9", "10"), c("act1.050", "act1.051", "act1.052", 
"act1.053", "act1.054", "act1.055")))

据我所知,car 包可以在其 recode 函数中处理字符串或字符,但我不确定。另一种方法是 sjmisc-package,通过将字符串转换为数值并稍后设置回值标签来绕道:

library(sjmisc)
dtaSeq <- as.data.frame(dtaSeq)
# convert to values
dtaSeq.values <- to_value(dtaSeq)
# random recode example, use your own values for clustering here
dtaSeq.values <- rec(dtaSeq.values, "1:3=1; 4:6=2; else=3")
# set value labels, these will be added as attributes
dtaSeq.values <- set_val_labels(dtaSeq.values, c("meal", "leisure", "personal care"))
# replace numeric values with assicated label attributes
dtaSeq.values <- to_label(dtaSeq.values)

结果:

> head(dtaSeq.values)
       act1.050      act1.051 act1.052      act1.053      act1.054      act1.055
1 personal care personal care  leisure personal care          meal       leisure
2          meal          meal     meal          meal personal care personal care
3 personal care          meal     meal          meal       leisure          meal
4          meal personal care  leisure personal care personal care       leisure
5       leisure       leisure     meal       leisure       leisure          meal
6          meal personal care  leisure personal care       leisure          meal

sjmisc-recode 函数的一个优点是,如果您有一个包含类似 "structure" 变量的数据框,您只需调用一次 rec 就可以重新编码完整的数据框。

这对你有帮助吗?

您似乎没有为您的真实数据指定完整的重新编码规则, 所以我做了一些:

recodes <- list("meals"=c("Eating"),
                "leisure"=c("Reading Periodicals",
                             "Unspecified TV watching"),
                "child care"=c("Feeding the child","Teaching the child"),
                "house care"=c("Food preparation","Dish washing",
                                "Cleaning dwelling","Ironing"))

这是一个通用的重新编码功能。 car::recode 确实有效, 但我觉得它有点笨拙。还有 plyr::revalue,但是 是一对一,不是多对一。

recodeFun <- function(x) {
    for (i in seq_along(recodes)) {
        x[x %in% recodes[[i]]] <- names(recodes)[i]
           }
           return(x)
}
d2 <- recodeFun(dtaSeq)