R如何计算因子水平的出现
R How to count occurrence of factor levels
我有以下格式的数据:
ID Task1 Task2 Task3 Task4
abc Hard Hard Mix Hard
xyz Easy Mix Easy Hard
als Mix Hard Easy Hard
bld Hard Mix Easy Easy
cqr Hard Easy Hard Hard
alx Hard Hard Hard Hard
对于每个 ID,我有兴趣分别计算每个因素级别的出现次数 - 在本例中为 Hard、Mix 和 Easy(见下文)。我想计算每个 ID 每个因素的总出现次数,然后我还想计算该 ID 的最大连续出现次数,例如,
ID Task1 Task2 Task3 Task4 Hard_Total Max_Consecutive_Hard
abc Hard Hard Mix Hard 3 2
xyz Easy Mix Easy Hard 1 1
als Mix Hard Easy Hard 2 1
bld Hard Mix Easy Easy 1 1
cqr Hard Easy Hard Hard 3 2
alx Hard Hard Hard Hard 4 4
有人可以提出解决方案吗?
示例数据的 dput() 如下。
structure(list(ID = structure(c(1L, 6L, 2L, 4L, 5L, 3L), .Label = c("abc","als", "alx", "bld", "cqr", "xyz"), class = "factor"), Task1 = structure(c(2L, 1L, 3L, 2L, 2L, 2L), .Label = c("Easy", "Hard", "Mix"), class = "factor"), Task2 = structure(c(2L, 3L, 2L, 3L, 1L, 2L), .Label = c("Easy", "Hard", "Mix"), class = "factor"), Task3 = structure(c(3L, 1L, 1L, 1L, 2L, 2L), .Label = c("Easy", "Hard", "Mix"), class = "factor"), Task4 = structure(c(2L, 2L, 2L, 1L, 2L, 2L), .Label = c("Easy", "Hard"), class = "factor")), class = "data.frame", row.names = c(NA, -6L))
您可以使用rowSums()
按行获取Hard
值的总数,然后使用rle()
按行获取最长的运行:
transform(df, Hard_Total = rowSums(df[paste0("Task", 1:4)] == "Hard", na.rm = TRUE),
Max_Consecutive_Hard = apply(df[paste0("Task", 1:4)], 1, function(x) with(rle(x), max(lengths[values == "Hard"], na.rm = TRUE))))
ID Task1 Task2 Task3 Task4 Hard_Total Max_Consecutive_Hard
1 abc Hard Hard Mix Hard 3 2
2 xyz Easy Mix Easy Hard 1 1
3 als Mix Hard Easy Hard 2 1
4 bld Hard Mix Easy Easy 1 1
5 cqr Hard Easy Hard Hard 3 2
6 alx Hard Hard Hard Hard 4 4
首先,我们创建函数来获取您需要的 fun_hard
和 fun_max
两列。 fun_hard()
计算输入中出现 "Hard" 的次数,而 fun_max()
使用 rle()
.
计算输入中最大连续“困难”出现次数
fun_hard = function(x) {
sum(x=="Hard")
}
fun_max = function(x) {
rle_hard <- rle(x)
max(rle_hard$lengths[rle_hard$values == "Hard"])
}
我们使用 apply()
在给定的 df
.
的每一行上使用 fun_hard()
和 fun_max()
test_df$Hard_Total = apply(test_df[,c(2,3,4,5)], MARGIN = 1, FUN = fun_hard)
test_df$Max_Consecutive_Hard =
apply(test_df[,c(2,3,4,5)], MARGIN = 1, FUN = fun_max)
输出:
ID Task1 Task2 Task3 Task4 Hard_Total Max_Consecutive_Hard
1 abc Hard Hard Mix Hard 3 2
2 xyz Easy Mix Easy Hard 1 1
3 als Mix Hard Easy Hard 2 1
4 bld Hard Mix Easy Easy 1 1
5 cqr Hard Easy Hard Hard 3 2
6 alx Hard Hard Hard Hard 4 4
我有以下格式的数据:
ID Task1 Task2 Task3 Task4
abc Hard Hard Mix Hard
xyz Easy Mix Easy Hard
als Mix Hard Easy Hard
bld Hard Mix Easy Easy
cqr Hard Easy Hard Hard
alx Hard Hard Hard Hard
对于每个 ID,我有兴趣分别计算每个因素级别的出现次数 - 在本例中为 Hard、Mix 和 Easy(见下文)。我想计算每个 ID 每个因素的总出现次数,然后我还想计算该 ID 的最大连续出现次数,例如,
ID Task1 Task2 Task3 Task4 Hard_Total Max_Consecutive_Hard
abc Hard Hard Mix Hard 3 2
xyz Easy Mix Easy Hard 1 1
als Mix Hard Easy Hard 2 1
bld Hard Mix Easy Easy 1 1
cqr Hard Easy Hard Hard 3 2
alx Hard Hard Hard Hard 4 4
有人可以提出解决方案吗?
示例数据的 dput() 如下。
structure(list(ID = structure(c(1L, 6L, 2L, 4L, 5L, 3L), .Label = c("abc","als", "alx", "bld", "cqr", "xyz"), class = "factor"), Task1 = structure(c(2L, 1L, 3L, 2L, 2L, 2L), .Label = c("Easy", "Hard", "Mix"), class = "factor"), Task2 = structure(c(2L, 3L, 2L, 3L, 1L, 2L), .Label = c("Easy", "Hard", "Mix"), class = "factor"), Task3 = structure(c(3L, 1L, 1L, 1L, 2L, 2L), .Label = c("Easy", "Hard", "Mix"), class = "factor"), Task4 = structure(c(2L, 2L, 2L, 1L, 2L, 2L), .Label = c("Easy", "Hard"), class = "factor")), class = "data.frame", row.names = c(NA, -6L))
您可以使用rowSums()
按行获取Hard
值的总数,然后使用rle()
按行获取最长的运行:
transform(df, Hard_Total = rowSums(df[paste0("Task", 1:4)] == "Hard", na.rm = TRUE),
Max_Consecutive_Hard = apply(df[paste0("Task", 1:4)], 1, function(x) with(rle(x), max(lengths[values == "Hard"], na.rm = TRUE))))
ID Task1 Task2 Task3 Task4 Hard_Total Max_Consecutive_Hard
1 abc Hard Hard Mix Hard 3 2
2 xyz Easy Mix Easy Hard 1 1
3 als Mix Hard Easy Hard 2 1
4 bld Hard Mix Easy Easy 1 1
5 cqr Hard Easy Hard Hard 3 2
6 alx Hard Hard Hard Hard 4 4
首先,我们创建函数来获取您需要的 fun_hard
和 fun_max
两列。 fun_hard()
计算输入中出现 "Hard" 的次数,而 fun_max()
使用 rle()
.
fun_hard = function(x) {
sum(x=="Hard")
}
fun_max = function(x) {
rle_hard <- rle(x)
max(rle_hard$lengths[rle_hard$values == "Hard"])
}
我们使用 apply()
在给定的 df
.
fun_hard()
和 fun_max()
test_df$Hard_Total = apply(test_df[,c(2,3,4,5)], MARGIN = 1, FUN = fun_hard)
test_df$Max_Consecutive_Hard =
apply(test_df[,c(2,3,4,5)], MARGIN = 1, FUN = fun_max)
输出:
ID Task1 Task2 Task3 Task4 Hard_Total Max_Consecutive_Hard
1 abc Hard Hard Mix Hard 3 2
2 xyz Easy Mix Easy Hard 1 1
3 als Mix Hard Easy Hard 2 1
4 bld Hard Mix Easy Easy 1 1
5 cqr Hard Easy Hard Hard 3 2
6 alx Hard Hard Hard Hard 4 4