关于使用 Gather/Spread 在 R 中将 DataFrame 从 LONG 重塑为 WIDE 的说明

Clarification on Reshaping DataFrame from LONG to WIDE in R with Gather/Spread

很抱歉回到 Stack 上有多个线程的主题,但我正在尝试使用 Tidyverse 以及 Gather/Spread 函数和pivot_wider 函数,我迷路了。 这是我用于测试的子集的示例

    structure(list(pid = structure(c(1L, 1L, 1L, 1L, 1L, 2L), .Label = c("1", 
"2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", 
"14", "15", "16", "17", "18", "19", "20", "21", "22", "23", "24", 
"25", "26", "27", "28", "29", "30", "31", "32", "33", "34", "35", 
"36", "37", "38", "39", "40", "41", "42", "43", "44", "45", "46", 
"47", "48", "49", "50", "51", "52", "53", "54", "55", "56", "57", 
"58", "59", "60", "61", "62", "63", "64", "65", "66", "67", "68", 
"69", "70", "71", "72", "73", "74", "75", "76", "77", "78", "79", 
"80", "81", "82", "83", "84", "85", "86", "87", "88", "89", "90", 
"91", "92", "93", "94", "95", "96", "97", "98", "99", "100", 
"101", "102", "103", "104", "105", "106", "107", "108", "109", 
"110", "111", "112", "113", "114", "115", "116", "117", "118", 
"119", "120", "121", "122", "123", "124", "125", "126", "127", 
"128", "129", "130", "131", "132", "133", "134", "135", "136", 
"137", "138", "139", "140", "141", "142", "143", "144", "145", 
"146", "147", "148", "149", "150", "151", "152", "153", "154", 
"155", "156", "157", "158", "159", "160", "161", "162", "163", 
"164", "165", "166", "167", "168", "169", "170", "171", "172", 
"173", "174", "175", "176", "177", "178", "179", "180", "181", 
"182", "183", "184", "185", "186", "187", "188", "189", "190", 
"191", "192", "193", "194", "195", "196", "197", "198", "199", 
"200", "201", "202", "203", "204", "205", "206", "207", "208", 
"209", "210", "211", "212", "213", "214", "215", "216"), class = "factor"), 
    timewave = structure(c(1L, 2L, 3L, 4L, 5L, 1L), .Label = c("1", 
    "2", "3", "4", "5", "6", "7", "8"), class = "factor"), dev_icd = structure(c(1L, 
    1L, 1L, 1L, 1L, 2L), .Label = c("No", "Yes"), class = "factor"), 
    lab_bnp = c(388, 199, 387.5, 318, 154, 949.4)), row.names = c(NA, 
6L), class = "data.frame")

这是我想到的 2 个命令

test.wide2 <- test.long2 %>%
          pivot_wider(id_cols = pid, 
                      names_from = timewave, 
                      values_from = c(dev_icd, lab_bnp), 
                      names_sep = "")

或者也

test.wide <- test.long2 %>%
         group_by(pid) %>%
         gather("dev_icd", "lab_bnp", 
                key = variable, value = number ) %>%
         unite(combi, variable, timewave) %>%
         spread(combi, number)

两者都没有像我预期的那样工作,我得到了很多 NA 或 NULL 值并且不明白我的错误是什么以及正确的程序。 任何帮助,不仅是解决问题,而且主要是理解重塑 logic/philosophy 将不胜感激

你需要更明确地说明你想要什么,我们只能假设。您不能期望任何不以最宽格式出现的值。我猜你想要这样的东西。

test.long2 %>%
    pivot_wider(id_cols = c("pid", "timewave"), 
    names_from = pid, 
    values_from = c(dev_icd, lab_bnp), 
    names_sep = "_pid")

# A tibble: 5 x 5
  timewave dev_icd_pid1 dev_icd_pid2 lab_bnp_pid1 lab_bnp_pid2
  <fct>    <fct>        <fct>               <dbl>        <dbl>
1 1        No           Yes                  388          949.
2 2        No           NA                   199           NA 
3 3        No           NA                   388.          NA 
4 4        No           NA                   318           NA 
5 5        No           NA                   154           NA 

感谢 Merjin van Tiborg 的帮助,我终于确定了这个问题。使 dev-icd 和 lab_bnp 的行和列上的 PID 按时间波数重复的正确命令如下:

    test.wide <- hf.longsmall %>%
  pivot_wider(id_cols = c("pid", "timewave"), 
              names_from = timewave, 
              values_from = c(dev_icd, lab_bnp), 
              names_sep = "_t")

也就是相当于下面的

    test.wide1  <- hf.longsmall %>% 
  group_by(pid, timewave) %>%
  mutate(row = row_number()) %>%
  tidyr::pivot_wider(names_from = timewave, 
                     values_from = c(dev_icd, lab_bnp), 
                     names_sep = "_t") %>%
  select(-row)

我收到以下警告 --> “values_from 中的值不是唯一标识的;输出将包含 list-cols”,这是由于一个真实的(且危险的)错误数据输入期间重复的 PID。无论如何,我只能使用上面报告的 group_by 选项来理解这个问题。

谢谢大家的耐心等待