如何为 R 中的 "n" 簇创建一个通用的字母和数字字符串以添加到数据框中?
How to create a generic string of letters and numbers for "n" clusters in R to add in a dataframe?
我有这个:
df<-structure(list(x = c(-0.803739264931451, 0.852850728148773, 0.927179506105653, -0.752626056626365, 0.706846224294882, 1.0346985222527, -0.475845197699957, -0.460301566967151, -0.680301544955355, -1.03196929988978), y = c(-0.853052609097935, 0.367618436999606, -0.274902437566225, -0.511565170496435, 0.81067919693492, 0.394655023166806, 0.989760805249143, -0.858997792847955, -0.66149481321353, -0.0219935446644728), shape = c(1, 1, 2, 2, 2, 2, 3, 3, 4, 4)), row.names = c(NA, 10L), class = "data.frame")
输出:
x
y
shape
-0.8037393
-0.85305261
1
0.8528507
0.36761844
1
0.9271795
-0.27490244
2
-0.7526261
-0.51156517
2
0.7068462
0.81067920
2
1.0346985
0.39465502
2
-0.4758452
0.98976081
3
-0.4603016
-0.85899779
3
-0.6803015
-0.66149481
4
-1.0319693
-0.02199354
4
预期输出:
如何在 R 中为“n”个簇创建一个通用的字母和数字字符串以添加到数据框中,如下所示:
obs:比如有100个簇,第100个簇的label可以是AA1等等
df$label<- #What is the correct code for this problem?
x
y
shape
label
-0.8037393
-0.85305261
1
A1
0.8528507
0.36761844
1
A2
0.9271795
-0.27490244
2
B1
-0.7526261
-0.51156517
2
B2
0.7068462
0.81067920
2
B3
1.0346985
0.39465502
2
B4
-0.4758452
0.98976081
3
C1
-0.4603016
-0.85899779
3
C2
-0.6803015
-0.66149481
4
D1
-1.0319693
-0.02199354
4
D2
这里有一个小函数可以为您完成:
f <- function(g,n) {
letter_index = if_else(g%%26 ==0, 26, g%%26)
paste0(
paste0(rep(LETTERS[letter_index], times = ceiling(g/26)), collapse=""),
1:n)
}
现在将该函数应用于每个形状值,使用 group_by()
和 mutate()
df %>%
group_by(shape) %>%
mutate(code = f(cur_group_id(), n()))
输出:
x y shape code
<dbl> <dbl> <dbl> <chr>
1 -0.804 -0.853 1 A1
2 0.853 0.368 1 A2
3 0.927 -0.275 2 B1
4 -0.753 -0.512 2 B2
5 0.707 0.811 2 B3
6 1.03 0.395 2 B4
7 -0.476 0.990 3 C1
8 -0.460 -0.859 3 C2
9 -0.680 -0.661 4 D1
10 -1.03 -0.0220 4 D2
解释:
- 函数
f()
有两个值,一个表示组号的整数(由 cur_groupid()
传递)和 shape
值中的值的数量(由 n()
).在函数中,我们使用取模得到正确的次数来复制LETTERS值,然后我们将其粘贴到从1到n 的序列中
我有这个:
df<-structure(list(x = c(-0.803739264931451, 0.852850728148773, 0.927179506105653, -0.752626056626365, 0.706846224294882, 1.0346985222527, -0.475845197699957, -0.460301566967151, -0.680301544955355, -1.03196929988978), y = c(-0.853052609097935, 0.367618436999606, -0.274902437566225, -0.511565170496435, 0.81067919693492, 0.394655023166806, 0.989760805249143, -0.858997792847955, -0.66149481321353, -0.0219935446644728), shape = c(1, 1, 2, 2, 2, 2, 3, 3, 4, 4)), row.names = c(NA, 10L), class = "data.frame")
输出:
x | y | shape |
---|---|---|
-0.8037393 | -0.85305261 | 1 |
0.8528507 | 0.36761844 | 1 |
0.9271795 | -0.27490244 | 2 |
-0.7526261 | -0.51156517 | 2 |
0.7068462 | 0.81067920 | 2 |
1.0346985 | 0.39465502 | 2 |
-0.4758452 | 0.98976081 | 3 |
-0.4603016 | -0.85899779 | 3 |
-0.6803015 | -0.66149481 | 4 |
-1.0319693 | -0.02199354 | 4 |
预期输出: 如何在 R 中为“n”个簇创建一个通用的字母和数字字符串以添加到数据框中,如下所示:
obs:比如有100个簇,第100个簇的label可以是AA1等等
df$label<- #What is the correct code for this problem?
x | y | shape | label |
---|---|---|---|
-0.8037393 | -0.85305261 | 1 | A1 |
0.8528507 | 0.36761844 | 1 | A2 |
0.9271795 | -0.27490244 | 2 | B1 |
-0.7526261 | -0.51156517 | 2 | B2 |
0.7068462 | 0.81067920 | 2 | B3 |
1.0346985 | 0.39465502 | 2 | B4 |
-0.4758452 | 0.98976081 | 3 | C1 |
-0.4603016 | -0.85899779 | 3 | C2 |
-0.6803015 | -0.66149481 | 4 | D1 |
-1.0319693 | -0.02199354 | 4 | D2 |
这里有一个小函数可以为您完成:
f <- function(g,n) {
letter_index = if_else(g%%26 ==0, 26, g%%26)
paste0(
paste0(rep(LETTERS[letter_index], times = ceiling(g/26)), collapse=""),
1:n)
}
现在将该函数应用于每个形状值,使用 group_by()
和 mutate()
df %>%
group_by(shape) %>%
mutate(code = f(cur_group_id(), n()))
输出:
x y shape code
<dbl> <dbl> <dbl> <chr>
1 -0.804 -0.853 1 A1
2 0.853 0.368 1 A2
3 0.927 -0.275 2 B1
4 -0.753 -0.512 2 B2
5 0.707 0.811 2 B3
6 1.03 0.395 2 B4
7 -0.476 0.990 3 C1
8 -0.460 -0.859 3 C2
9 -0.680 -0.661 4 D1
10 -1.03 -0.0220 4 D2
解释:
- 函数
f()
有两个值,一个表示组号的整数(由cur_groupid()
传递)和shape
值中的值的数量(由n()
).在函数中,我们使用取模得到正确的次数来复制LETTERS值,然后我们将其粘贴到从1到n 的序列中