R如何计算因子水平的出现

R How to count occurrence of factor levels

我有以下格式的数据:

ID    Task1   Task2   Task3   Task4
abc   Hard    Hard    Mix     Hard              
xyz   Easy    Mix     Easy    Hard               
als   Mix     Hard    Easy    Hard               
bld   Hard    Mix     Easy    Easy               
cqr   Hard    Easy    Hard    Hard               
alx   Hard    Hard    Hard    Hard               

对于每个 ID,我有兴趣分别计算每个因素级别的出现次数 - 在本例中为 Hard、Mix 和 Easy(见下文)。我想计算每个 ID 每个因素的总出现次数,然后我还想计算该 ID 的最大连续出现次数,例如,

ID    Task1   Task2   Task3   Task4   Hard_Total   Max_Consecutive_Hard
abc   Hard    Hard    Mix     Hard    3            2
xyz   Easy    Mix     Easy    Hard    1            1
als   Mix     Hard    Easy    Hard    2            1
bld   Hard    Mix     Easy    Easy    1            1
cqr   Hard    Easy    Hard    Hard    3            2
alx   Hard    Hard    Hard    Hard    4            4

有人可以提出解决方案吗?

示例数据的 dput() 如下。

structure(list(ID = structure(c(1L, 6L, 2L, 4L, 5L, 3L), .Label = c("abc","als", "alx", "bld", "cqr", "xyz"), class = "factor"), Task1 = structure(c(2L, 1L, 3L, 2L, 2L, 2L), .Label = c("Easy", "Hard", "Mix"), class = "factor"), Task2 = structure(c(2L, 3L, 2L, 3L, 1L, 2L), .Label = c("Easy", "Hard", "Mix"), class = "factor"), Task3 = structure(c(3L, 1L, 1L, 1L, 2L, 2L), .Label = c("Easy", "Hard", "Mix"), class = "factor"), Task4 = structure(c(2L, 2L, 2L, 1L, 2L, 2L), .Label = c("Easy", "Hard"), class = "factor")), class = "data.frame", row.names = c(NA, -6L))

您可以使用rowSums()按行获取Hard值的总数,然后使用rle()按行获取最长的运行:

transform(df, Hard_Total = rowSums(df[paste0("Task", 1:4)] == "Hard", na.rm = TRUE),
              Max_Consecutive_Hard = apply(df[paste0("Task", 1:4)], 1, function(x) with(rle(x), max(lengths[values == "Hard"], na.rm = TRUE))))

   ID Task1 Task2 Task3 Task4 Hard_Total Max_Consecutive_Hard
1 abc  Hard  Hard   Mix  Hard          3                    2
2 xyz  Easy   Mix  Easy  Hard          1                    1
3 als   Mix  Hard  Easy  Hard          2                    1
4 bld  Hard   Mix  Easy  Easy          1                    1
5 cqr  Hard  Easy  Hard  Hard          3                    2
6 alx  Hard  Hard  Hard  Hard          4                    4

首先,我们创建函数来获取您需要的 fun_hardfun_max 两列。 fun_hard() 计算输入中出现 "Hard" 的次数,而 fun_max() 使用 rle().

计算输入中最大连续“困难”出现次数
fun_hard = function(x) {
  sum(x=="Hard")
}

fun_max = function(x) {
  rle_hard <- rle(x)
  max(rle_hard$lengths[rle_hard$values == "Hard"])
}

我们使用 apply() 在给定的 df.

的每一行上使用 fun_hard()fun_max()
test_df$Hard_Total = apply(test_df[,c(2,3,4,5)], MARGIN = 1, FUN = fun_hard)
test_df$Max_Consecutive_Hard = 
              apply(test_df[,c(2,3,4,5)], MARGIN = 1, FUN = fun_max)

输出:

  ID Task1 Task2 Task3 Task4 Hard_Total Max_Consecutive_Hard
1 abc  Hard  Hard   Mix  Hard          3                    2
2 xyz  Easy   Mix  Easy  Hard          1                    1
3 als   Mix  Hard  Easy  Hard          2                    1
4 bld  Hard   Mix  Easy  Easy          1                    1
5 cqr  Hard  Easy  Hard  Hard          3                    2
6 alx  Hard  Hard  Hard  Hard          4                    4