为另一列中的每个唯一值分配一个名称
Assign a name for each unique value in another column
我在 df 中有 C1,我想根据 C1 中的每个唯一值获得一个带有 ID 的新列 C2。
但我想为 C2 (Group) 指定一个特定名称,后跟一个数字,从 01 而不是 1 开始计数,因为我将有多达 13 个组并且想要分组他们正确。
我还想为最后一个唯一值 (Z) 保留相同的名称,以便 C2 看起来像这样:
C1 C2
<chr> <chr>
1 A Group01
2 A Group01
3 A Group01
4 A Group01
5 B Group02
6 B Group02
7 B Group02
8 B Group02
9 C Group03
10 C Group03
11 C Group03
12 C Group03
13 Z Z
14 Z Z
15 Z Z
16 Z Z
我已经尝试获取 ID,例如
df <- transform(df,id=as.numeric(factor(C1)))
但是我明白了。
C1 C2 id
1 A Group01 1
2 A Group01 1
3 A Group01 1
4 A Group01 1
5 B Group02 2
6 B Group02 2
7 B Group02 2
8 B Group02 2
9 C Group03 3
10 C Group03 3
11 C Group03 3
12 C Group03 3
13 Z Z 4
14 Z Z 4
15 Z Z 4
16 Z Z 4
我想我可以使用“组”参数创建一个新列,但我不知道如何获取从 01 开始的 ID。
您可以使用 match
+ unique
为每个 C1
值获取一个唯一编号,保持组中最后一个值的值与 C1
相同.使用 sprintf
获取值为 01。
library(dplyr)
df <- df %>%
mutate(tmp = match(C1, unique(C1)),
C2 = replace(sprintf('Group%02d', tmp), C1 == 'Z', 'Z')) %>%
select(-tmp)
df
# C1 C2
#1 A Group01
#2 A Group01
#3 A Group01
#4 A Group01
#5 B Group02
#6 B Group02
#7 B Group02
#8 B Group02
#9 C Group03
#10 C Group03
#11 C Group03
#12 C Group03
#13 Z Z
#14 Z Z
#15 Z Z
#16 Z Z
数据
df <- structure(list(C1 = c("A", "A", "A", "A", "B", "B", "B", "B",
"C", "C", "C", "C", "Z", "Z", "Z", "Z")), row.names = c(NA, -16L
), class = "data.frame")
EDIT 在这种情况下,您可以使用 if_else 语句
df <- data.frame(C1 = c(rep(LETTERS[1:7], each = 4), rep("Z", 4)))
df
df %>% mutate(C2 = if_else(C1 == "Z", C1, paste0("Group", str_pad(dense_rank(C1), width = 2, side = "left", pad = "0"))))
C1 C2
1 A Group01
2 A Group01
3 A Group01
4 A Group01
5 B Group02
6 B Group02
7 B Group02
8 B Group02
9 C Group03
10 C Group03
11 C Group03
12 C Group03
13 D Group04
14 D Group04
15 D Group04
16 D Group04
17 E Group05
18 E Group05
19 E Group05
20 E Group05
21 F Group06
22 F Group06
23 F Group06
24 F Group06
25 G Group07
26 G Group07
27 G Group07
28 G Group07
29 Z Z
30 Z Z
31 Z Z
32 Z Z
或者如果最后一个值未知
df %>% mutate(d = dense_rank(C1),
C2 = if_else(d == max(d), C1, paste0("Group", str_pad(d, width = 2, side = "left", pad = "0")))) %>%
select(-d)
C1 C2
1 A Group01
2 A Group01
3 A Group01
4 A Group01
5 B Group02
6 B Group02
7 B Group02
8 B Group02
9 C Group03
10 C Group03
11 C Group03
12 C Group03
13 D Group04
14 D Group04
15 D Group04
16 D Group04
17 E Group05
18 E Group05
19 E Group05
20 E Group05
21 F Group06
22 F Group06
23 F Group06
24 F Group06
25 G Group07
26 G Group07
27 G Group07
28 G Group07
29 Z Z
30 Z Z
31 Z Z
32 Z Z
我在 df 中有 C1,我想根据 C1 中的每个唯一值获得一个带有 ID 的新列 C2。
但我想为 C2 (Group) 指定一个特定名称,后跟一个数字,从 01 而不是 1 开始计数,因为我将有多达 13 个组并且想要分组他们正确。
我还想为最后一个唯一值 (Z) 保留相同的名称,以便 C2 看起来像这样:
C1 C2
<chr> <chr>
1 A Group01
2 A Group01
3 A Group01
4 A Group01
5 B Group02
6 B Group02
7 B Group02
8 B Group02
9 C Group03
10 C Group03
11 C Group03
12 C Group03
13 Z Z
14 Z Z
15 Z Z
16 Z Z
我已经尝试获取 ID,例如
df <- transform(df,id=as.numeric(factor(C1)))
但是我明白了。
C1 C2 id
1 A Group01 1
2 A Group01 1
3 A Group01 1
4 A Group01 1
5 B Group02 2
6 B Group02 2
7 B Group02 2
8 B Group02 2
9 C Group03 3
10 C Group03 3
11 C Group03 3
12 C Group03 3
13 Z Z 4
14 Z Z 4
15 Z Z 4
16 Z Z 4
我想我可以使用“组”参数创建一个新列,但我不知道如何获取从 01 开始的 ID。
您可以使用 match
+ unique
为每个 C1
值获取一个唯一编号,保持组中最后一个值的值与 C1
相同.使用 sprintf
获取值为 01。
library(dplyr)
df <- df %>%
mutate(tmp = match(C1, unique(C1)),
C2 = replace(sprintf('Group%02d', tmp), C1 == 'Z', 'Z')) %>%
select(-tmp)
df
# C1 C2
#1 A Group01
#2 A Group01
#3 A Group01
#4 A Group01
#5 B Group02
#6 B Group02
#7 B Group02
#8 B Group02
#9 C Group03
#10 C Group03
#11 C Group03
#12 C Group03
#13 Z Z
#14 Z Z
#15 Z Z
#16 Z Z
数据
df <- structure(list(C1 = c("A", "A", "A", "A", "B", "B", "B", "B",
"C", "C", "C", "C", "Z", "Z", "Z", "Z")), row.names = c(NA, -16L
), class = "data.frame")
EDIT 在这种情况下,您可以使用 if_else 语句
df <- data.frame(C1 = c(rep(LETTERS[1:7], each = 4), rep("Z", 4)))
df
df %>% mutate(C2 = if_else(C1 == "Z", C1, paste0("Group", str_pad(dense_rank(C1), width = 2, side = "left", pad = "0"))))
C1 C2
1 A Group01
2 A Group01
3 A Group01
4 A Group01
5 B Group02
6 B Group02
7 B Group02
8 B Group02
9 C Group03
10 C Group03
11 C Group03
12 C Group03
13 D Group04
14 D Group04
15 D Group04
16 D Group04
17 E Group05
18 E Group05
19 E Group05
20 E Group05
21 F Group06
22 F Group06
23 F Group06
24 F Group06
25 G Group07
26 G Group07
27 G Group07
28 G Group07
29 Z Z
30 Z Z
31 Z Z
32 Z Z
或者如果最后一个值未知
df %>% mutate(d = dense_rank(C1),
C2 = if_else(d == max(d), C1, paste0("Group", str_pad(d, width = 2, side = "left", pad = "0")))) %>%
select(-d)
C1 C2
1 A Group01
2 A Group01
3 A Group01
4 A Group01
5 B Group02
6 B Group02
7 B Group02
8 B Group02
9 C Group03
10 C Group03
11 C Group03
12 C Group03
13 D Group04
14 D Group04
15 D Group04
16 D Group04
17 E Group05
18 E Group05
19 E Group05
20 E Group05
21 F Group06
22 F Group06
23 F Group06
24 F Group06
25 G Group07
26 G Group07
27 G Group07
28 G Group07
29 Z Z
30 Z Z
31 Z Z
32 Z Z