为另一列中的每个唯一值分配一个名称

Assign a name for each unique value in another column

我在 df 中有 C1,我想根据 C1 中的每个唯一值获得一个带有 ID 的新列 C2。
但我想为 C2 (Group) 指定一个特定名称,后跟一个数字,从 01 而不是 1 开始计数,因为我将有多达 13 个组并且想要分组他们正确。 我还想为最后一个唯一值 (Z) 保留相同的名称,以便 C2 看起来像这样:

   C1    C2     
   <chr> <chr>  
 1 A     Group01
 2 A     Group01
 3 A     Group01
 4 A     Group01
 5 B     Group02
 6 B     Group02
 7 B     Group02
 8 B     Group02
 9 C     Group03
10 C     Group03
11 C     Group03
12 C     Group03
13 Z     Z      
14 Z     Z      
15 Z     Z      
16 Z     Z 

我已经尝试获取 ID,例如 df <- transform(df,id=as.numeric(factor(C1))) 但是我明白了。

   C1      C2 id
1   A Group01  1
2   A Group01  1
3   A Group01  1
4   A Group01  1
5   B Group02  2
6   B Group02  2
7   B Group02  2
8   B Group02  2
9   C Group03  3
10  C Group03  3
11  C Group03  3
12  C Group03  3
13  Z       Z  4
14  Z       Z  4
15  Z       Z  4
16  Z       Z  4 

我想我可以使用“组”参数创建一个新列,但我不知道如何获取从 01 开始的 ID。

您可以使用 match + unique 为每个 C1 值获取一个唯一编号,保持组中最后一个值的值与 C1 相同.使用 sprintf 获取值为 01。

library(dplyr)

df <- df %>%
        mutate(tmp = match(C1, unique(C1)), 
               C2 = replace(sprintf('Group%02d', tmp), C1 == 'Z', 'Z')) %>%
        select(-tmp)
df

#   C1      C2
#1   A Group01
#2   A Group01
#3   A Group01
#4   A Group01
#5   B Group02
#6   B Group02
#7   B Group02
#8   B Group02
#9   C Group03
#10  C Group03
#11  C Group03
#12  C Group03
#13  Z       Z
#14  Z       Z
#15  Z       Z
#16  Z       Z

数据

df <- structure(list(C1 = c("A", "A", "A", "A", "B", "B", "B", "B", 
"C", "C", "C", "C", "Z", "Z", "Z", "Z")), row.names = c(NA, -16L
), class = "data.frame")

EDIT 在这种情况下,您可以使用 if_else 语句

df <- data.frame(C1 = c(rep(LETTERS[1:7], each = 4), rep("Z", 4)))
df


df %>% mutate(C2 = if_else(C1 == "Z", C1, paste0("Group", str_pad(dense_rank(C1), width = 2, side = "left", pad = "0"))))

   C1      C2
1   A Group01
2   A Group01
3   A Group01
4   A Group01
5   B Group02
6   B Group02
7   B Group02
8   B Group02
9   C Group03
10  C Group03
11  C Group03
12  C Group03
13  D Group04
14  D Group04
15  D Group04
16  D Group04
17  E Group05
18  E Group05
19  E Group05
20  E Group05
21  F Group06
22  F Group06
23  F Group06
24  F Group06
25  G Group07
26  G Group07
27  G Group07
28  G Group07
29  Z       Z
30  Z       Z
31  Z       Z
32  Z       Z

或者如果最后一个值未知

df %>% mutate(d = dense_rank(C1),
              C2 = if_else(d == max(d), C1, paste0("Group", str_pad(d, width = 2, side = "left", pad = "0")))) %>%
  select(-d)

   C1      C2
1   A Group01
2   A Group01
3   A Group01
4   A Group01
5   B Group02
6   B Group02
7   B Group02
8   B Group02
9   C Group03
10  C Group03
11  C Group03
12  C Group03
13  D Group04
14  D Group04
15  D Group04
16  D Group04
17  E Group05
18  E Group05
19  E Group05
20  E Group05
21  F Group06
22  F Group06
23  F Group06
24  F Group06
25  G Group07
26  G Group07
27  G Group07
28  G Group07
29  Z       Z
30  Z       Z
31  Z       Z
32  Z       Z