如何为 R 中的 "n" 簇创建一个通用的字母和数字字符串以添加到数据框中?

How to create a generic string of letters and numbers for "n" clusters in R to add in a dataframe?

我有这个:

df<-structure(list(x = c(-0.803739264931451, 0.852850728148773, 0.927179506105653, -0.752626056626365, 0.706846224294882, 1.0346985222527, -0.475845197699957, -0.460301566967151, -0.680301544955355, -1.03196929988978), y = c(-0.853052609097935, 0.367618436999606, -0.274902437566225, -0.511565170496435, 0.81067919693492, 0.394655023166806, 0.989760805249143, -0.858997792847955, -0.66149481321353, -0.0219935446644728), shape = c(1, 1, 2, 2, 2, 2, 3, 3, 4, 4)), row.names = c(NA, 10L), class = "data.frame")

输出:

x y shape
-0.8037393 -0.85305261 1
0.8528507 0.36761844 1
0.9271795 -0.27490244 2
-0.7526261 -0.51156517 2
0.7068462 0.81067920 2
1.0346985 0.39465502 2
-0.4758452 0.98976081 3
-0.4603016 -0.85899779 3
-0.6803015 -0.66149481 4
-1.0319693 -0.02199354 4

预期输出: 如何在 R 中为“n”个簇创建一个通用的字母和数字字符串以添加到数据框中,如下所示:

obs:比如有100个簇,第100个簇的label可以是AA1等等

df$label<-   #What is the correct code for this problem?
x y shape label
-0.8037393 -0.85305261 1 A1
0.8528507 0.36761844 1 A2
0.9271795 -0.27490244 2 B1
-0.7526261 -0.51156517 2 B2
0.7068462 0.81067920 2 B3
1.0346985 0.39465502 2 B4
-0.4758452 0.98976081 3 C1
-0.4603016 -0.85899779 3 C2
-0.6803015 -0.66149481 4 D1
-1.0319693 -0.02199354 4 D2

这里有一个小函数可以为您完成:

f <- function(g,n) {
  letter_index = if_else(g%%26 ==0, 26, g%%26)
  paste0(
    paste0(rep(LETTERS[letter_index], times = ceiling(g/26)), collapse=""),
    1:n)
}

现在将该函数应用于每个形状值,使用 group_by()mutate()

df %>% 
  group_by(shape) %>% 
  mutate(code = f(cur_group_id(), n()))

输出:

        x       y shape code 
    <dbl>   <dbl> <dbl> <chr>
 1 -0.804 -0.853      1 A1   
 2  0.853  0.368      1 A2   
 3  0.927 -0.275      2 B1   
 4 -0.753 -0.512      2 B2   
 5  0.707  0.811      2 B3   
 6  1.03   0.395      2 B4   
 7 -0.476  0.990      3 C1   
 8 -0.460 -0.859      3 C2   
 9 -0.680 -0.661      4 D1   
10 -1.03  -0.0220     4 D2

解释:

  • 函数 f() 有两个值,一个表示组号的整数(由 cur_groupid() 传递)和 shape 值中的值的数量(由 n()).在函数中,我们使用取模得到正确的次数来复制LETTERS值,然后我们将其粘贴到从1到n
  • 的序列中