为变量级别组合的每个实例创建一个 "instance number"
Create an "instance number" for each instance of a combination of variable levels
我需要计算每个变量组合的实例数,并将其转换为一个新变量。例如,
set.seed(2)
V1 <- sample(rep(c(1:3),10))
V2 <- rep_len(c("small", "large"),30)
temp <- cbind(V1,V2)
产生一个数据帧,其前十行如下所示:
V1 V2
[1,] "3" "small"
[2,] "3" "large"
[3,] "3" "small"
[4,] "1" "large"
[5,] "2" "small"
[6,] "2" "large"
[7,] "1" "small"
[8,] "3" "large"
[9,] "3" "small"
[10,] "3" "large"
我需要一个新变量来计算到目前为止该变量组合在数据框中出现的次数。结果应该类似于:
V1 V2 V3
[1,] "3" "small" "1"
[2,] "3" "large" "1"
[3,] "3" "small" "2"
[4,] "1" "large" "1"
[5,] "2" "small" "1"
[6,] "2" "large" "1"
[7,] "1" "small" "1"
[8,] "3" "large" "2"
[9,] "3" "small" "3"
[10,] "3" "large" "3"
执行此操作的有效方法是什么? (我不需要它们一定是字符向量;我只需要一个通用的解决方案。)
我们可以在转换为data.frame
后按'V1'、'V2'分组,然后创建新列作为具有row_number()
的行序列
library(dplyr)
as.data.frame(temp) %>%
group_by(V1, V2) %>%
mutate(V3 = row_number())
数据
temp <- structure(list(V1 = c(3L, 3L, 3L, 1L, 2L, 2L, 1L, 3L, 3L, 3L),
V2 = c("small", "large", "small", "large", "small", "large",
"small", "large", "small", "large")), class = "data.frame",
row.names = c(NA,
-10L))
我需要计算每个变量组合的实例数,并将其转换为一个新变量。例如,
set.seed(2)
V1 <- sample(rep(c(1:3),10))
V2 <- rep_len(c("small", "large"),30)
temp <- cbind(V1,V2)
产生一个数据帧,其前十行如下所示:
V1 V2
[1,] "3" "small"
[2,] "3" "large"
[3,] "3" "small"
[4,] "1" "large"
[5,] "2" "small"
[6,] "2" "large"
[7,] "1" "small"
[8,] "3" "large"
[9,] "3" "small"
[10,] "3" "large"
我需要一个新变量来计算到目前为止该变量组合在数据框中出现的次数。结果应该类似于:
V1 V2 V3
[1,] "3" "small" "1"
[2,] "3" "large" "1"
[3,] "3" "small" "2"
[4,] "1" "large" "1"
[5,] "2" "small" "1"
[6,] "2" "large" "1"
[7,] "1" "small" "1"
[8,] "3" "large" "2"
[9,] "3" "small" "3"
[10,] "3" "large" "3"
执行此操作的有效方法是什么? (我不需要它们一定是字符向量;我只需要一个通用的解决方案。)
我们可以在转换为data.frame
后按'V1'、'V2'分组,然后创建新列作为具有row_number()
library(dplyr)
as.data.frame(temp) %>%
group_by(V1, V2) %>%
mutate(V3 = row_number())
数据
temp <- structure(list(V1 = c(3L, 3L, 3L, 1L, 2L, 2L, 1L, 3L, 3L, 3L),
V2 = c("small", "large", "small", "large", "small", "large",
"small", "large", "small", "large")), class = "data.frame",
row.names = c(NA,
-10L))