如何为 R 中的 "n" 簇创建一个通用的字母和数字字符串以添加到数据框中？

Question

我有这个：

df<-structure(list(x = c(-0.803739264931451, 0.852850728148773, 0.927179506105653, -0.752626056626365, 0.706846224294882, 1.0346985222527, -0.475845197699957, -0.460301566967151, -0.680301544955355, -1.03196929988978), y = c(-0.853052609097935, 0.367618436999606, -0.274902437566225, -0.511565170496435, 0.81067919693492, 0.394655023166806, 0.989760805249143, -0.858997792847955, -0.66149481321353, -0.0219935446644728), shape = c(1, 1, 2, 2, 2, 2, 3, 3, 4, 4)), row.names = c(NA, 10L), class = "data.frame")

输出：

x	y	shape
-0.8037393	-0.85305261	1
0.8528507	0.36761844	1
0.9271795	-0.27490244	2
-0.7526261	-0.51156517	2
0.7068462	0.81067920	2
1.0346985	0.39465502	2
-0.4758452	0.98976081	3
-0.4603016	-0.85899779	3
-0.6803015	-0.66149481	4
-1.0319693	-0.02199354	4

预期输出： 如何在 R 中为“n”个簇创建一个通用的字母和数字字符串以添加到数据框中，如下所示：

obs：比如有100个簇，第100个簇的label可以是AA1等等

df$label<-   #What is the correct code for this problem?

x	y	shape	label
-0.8037393	-0.85305261	1	A1
0.8528507	0.36761844	1	A2
0.9271795	-0.27490244	2	B1
-0.7526261	-0.51156517	2	B2
0.7068462	0.81067920	2	B3
1.0346985	0.39465502	2	B4
-0.4758452	0.98976081	3	C1
-0.4603016	-0.85899779	3	C2
-0.6803015	-0.66149481	4	D1
-1.0319693	-0.02199354	4	D2

Answer 1

这里有一个小函数可以为您完成：

f <- function(g,n) {
  letter_index = if_else(g%%26 ==0, 26, g%%26)
  paste0(
    paste0(rep(LETTERS[letter_index], times = ceiling(g/26)), collapse=""),
    1:n)
}

现在将该函数应用于每个形状值，使用 group_by() 和 mutate()

df %>% 
  group_by(shape) %>% 
  mutate(code = f(cur_group_id(), n()))

输出：

        x       y shape code 
    <dbl>   <dbl> <dbl> <chr>
 1 -0.804 -0.853      1 A1   
 2  0.853  0.368      1 A2   
 3  0.927 -0.275      2 B1   
 4 -0.753 -0.512      2 B2   
 5  0.707  0.811      2 B3   
 6  1.03   0.395      2 B4   
 7 -0.476  0.990      3 C1   
 8 -0.460 -0.859      3 C2   
 9 -0.680 -0.661      4 D1   
10 -1.03  -0.0220     4 D2

解释：

函数 f() 有两个值，一个表示组号的整数（由 cur_groupid() 传递）和 shape 值中的值的数量（由 n()).在函数中，我们使用取模得到正确的次数来复制LETTERS值，然后我们将其粘贴到从1到n

如何为 R 中的 "n" 簇创建一个通用的字母和数字字符串以添加到数据框中？

How to create a generic string of letters and numbers for "n" clusters in R to add in a dataframe?

r

range