如何使用其他变量值和序列有条件地创建类别

Question

如果能帮助我创建一个允许我使用一组其他变量值的顺序创建一个变量的类别的函数，我将不胜感激。

Specifically, I want a function that:

creates category E1 of the variable variable the first time that each combination of values of the variables A, B, and ID appears in the dataset.

creates category E2 of the variable variable the second time that each combination of values of the variables A, B, and ID appears in the dataset.

creates category E3 of the variable variable the third time that each combination of values of the variables A, B, and ID appears in the dataset.

creates category En of the variable variable the nth time that each combination of values of the variables A, B, and ID appears in the dataset.

#样本数据：

rowdT<-structure(list(A = c("a1", "a2", "a1", "a1", "a2", "a1", "a1", 
            "a2", "a1"), B = c("b2", "b2", "b2", "b1", "b2", "b2", "b1", 
            "b2", "b1"), ID = c("3", "4", "3", "1", "4", "3", "1", "4", "1"
            ), E = c(0.621142094943352, 0.742109450696123, 0.39439152996948, 
            0.40694392882818, 0.779607277916503, 0.550579323666347, 0.352622183880119, 
            0.690660491345867, 0.23378944873769)), class = c("data.table", 
            "data.frame"), row.names = c(NA, -9L))     
sampleDT <- melt(rowdT, id.vars = c("A", "B", "ID"))

#输入数据：

    A  B  ID variable    value
1: a1 b2  3        E 0.6211421
2: a2 b2  4        E 0.7421095
3: a1 b2  3        E 0.3943915
4: a1 b1  1        E 0.4069439
5: a2 b2  4        E 0.7796073
6: a1 b2  3        E 0.5505793
7: a1 b1  1        E 0.3526222
8: a2 b2  4        E 0.6906605
9: a1 b1  1        E 0.2337894

#预期输出：

    A  B  ID variable    value
4: a1 b1  1        E1 0.4069439
1: a1 b2  3        E1 0.6211421
2: a2 b2  4        E1 0.7421095
7: a1 b1  1        E2 0.3526222
3: a1 b2  3        E2 0.3943915
5: a2 b2  4        E2 0.7796073
9: a1 b1  1        E3 0.2337894
6: a1 b2  3        E3 0.5505793
8: a2 b2  4        E3 0.6906605

在此先感谢您的帮助。

Answer 1

首先将您的变量转换为字符向量以进行适当的强制转换，然后使用 data.table

sampleDT$variable = as.character(sampleDT$variable)

sampleDT[, variable := paste(variable,1:.N,sep = ""), by = c("A", "B", "ID")]

这会根据观察到的 A、B 和 ID 的组合创建唯一计数。

这会得到以下输出：

    A  B ID variable     value
1: a1 b2  3       E1 0.6211421
2: a2 b2  4       E1 0.7421095
3: a1 b2  3       E2 0.3943915
4: a1 b1  1       E1 0.4069439
5: a2 b2  4       E2 0.7796073
6: a1 b2  3       E3 0.5505793
7: a1 b1  1       E2 0.3526222
8: a2 b2  4       E3 0.6906605
9: a1 b1  1       E3 0.2337894

如有必要，您可以重新订购。

如何使用其他变量值和序列有条件地创建类别

how to create categories conditionally using other variables values and sequence

r

function

reshape

data.table

tidyr