R将单元格的值转换为等效的行数以进行相关

R turn value of cell into equivalent number of rows for correlation

这是我的 df 的头部,n = 40:

structure(list(Code = c("75", "75", "75", "75", "75", "75", "75", 
"75", "75", "75", "75", "75", "75", "R009", "R009", "R009", "R009", 
"R009", "R009", "R009", "R009", "R009", "R009", "R009", "R009", 
"R009", "R015", "R015", "R015", "R015", "R015", "R015", "R015", 
"R015", "R019", "R019", "R019", "R019", "R019", "R019"), Name = c("a", 
"b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "a", 
"b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "a", 
"f", "g", "h", "i", "k", "l", "m", "a", "b", "c", "d", "e", "f"
), n = c(41L, 14L, 7L, 5L, 11L, 138L, 4L, 92L, 19L, 10L, 167L, 
67L, 62L, 3L, 1L, 35L, 6L, 125L, 43L, 4L, 44L, 86L, 8L, 33L, 
37L, 13L, 8L, 32L, 1L, 3L, 2L, 17L, 2L, 7L, 45L, 14L, 10L, 8L, 
15L, 228L)), row.names = c(NA, -40L), groups = structure(list(
    Code = c("75", "R009", "R015", "R019"), .rows = structure(list(
        1:13, 14:26, 27:34, 35:40), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = c(NA, -4L), class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

这是截图:

我正在尝试将该 n 值转换为等效的行数。因此,在此屏幕截图中,我希望 Code == 75 和 Name = "a" 在数据框中重复为 41 行。

我尝试这样做的原因是因为我想看看代码和名称之间是否存在很强的相关性。所以在我有一个包含很多行的长数据框之后,我打算像这样使用 cor 函数:

cor(df$Code, df$Name)

但是因为我认为 cor 会拒绝我,因为名称不是数字我想我首先必须将所有名称转换为数字值。

df <- df %>%
  mutate(Name = case_when(Name == "a" ~ 1, 
                          Name == "b" ~ 2,
                          Name == "c" ~ 3,
                          Name == "d" ~ 4,
                          Name == "e" ~ 5,
                          Name == "f" ~ 6,
                          Name == "g" ~ 7,
                          Name == "h" ~ 8,
                          Name == "i" ~ 9,
                          Name == "j" ~ 10,
                          Name == "k" ~ 11,
                          Name == "l" ~ 12,
                          Name == "m" ~ 13))

如何将数据框中的 n 值转换为等效的行数?

另外,这个工作流程有意义吗?除了将摘要数据帧转换为更像“原始”数据,然后将类型转换为数值,然后比较两个向量之外,是否有找到相关性的捷径?

如果我们想复制,在 ungrouping

之后使用 uncount
library(dplyr)
library(tidyr)
df %>%
    ungroup %>%
    uncount(n)

-输出

# A tibble: 1,467 x 2
   Code   Name
   <chr> <dbl>
 1 75        1
 2 75        1
 3 75        1
 4 75        1
 5 75        1
 6 75        1
 7 75        1
 8 75        1
 9 75        1
10 75        1
# … with 1,457 more rows