不平衡面板数据到长格式

unbalanced panel data to long format

我在多个文件中有一个长度不一的面板数据。一个文件包含多个参与者的观察结果和每个参与者的几个指标,格式如下

X M_1.ccc M_1.ccc.1 M_1.ccc.2 M_2.ccc M_2.ccc.1 M_2.ccc.2
1 XXX. XXX. XXX. XXX. XXX. XXX.
2 XXX. XXX. XXX. XXX. XXX. XXX.
....
20 XXX. XXX. XXX. XXX. XXX. XXX.
21 XXX. XXX. XXX.
22 XXX. XXX. XXX.

我需要 table 的长格式

Wave id ccc . ccc.1 ccc.2
1 1 XXX. XXX. XXX.
2 1 XXX. XXX. XXX.
....
20 1 XXX. XXX. XXX.
21 1 XXX. XXX. XXX.
22 1 XXX. XXX. XXX.
1 2 XXX. XXX. XXX.
2 2 XXX. XXX. XXX.
....
20 2 XXX. XXX. XXX.

我正在尝试使用 panelr 程序包和 long_panel 函数:https://jacob-long.com/post/panelr-intro/,但那里的例子不多

这个尝试行不通

> long_panel(data_full, prefix="_", wave=X)

当我输入句点(每个参与者都不同)时,缺少起点...

test<-long_panel(data_full,
              prefix="_",
              wave=X,
              begin=1,
              end=2)

嗯,它实际上不是 1,但应该是空的...而且它 returns 错误:

Error in if (ncol(nn) != 2L) stop("failed to guess time-varying variables from their names") : argument is of length zero

关于如何解决这个问题有什么想法吗?

示例数据:

dat <- structure(list(X = c("1", "2", "3", "4", "5", "6"), MLC_1.c3d = c("12.061268", 
"12.166716", "12.292454", "12.439793", "12.608850", "12.799803"
), MLC_1.c3d.1 = c("-1.138404", "-1.275099", "-1.402655", "-1.523949", 
"-1.642789", "-1.761063"), MLC_1.c3d.2 = c("9.136170", "9.374666", 
"9.601912", "9.818493", "10.023846", "10.217005"), MLC_1.c3d.3 = c("87.739037", 
"88.746254", "89.675377", "90.512108", "91.259438", "91.935745"
), MLC_1.c3d.4 = c("25.202179", "25.669239", "26.133680", "26.592773", 
"27.045420", "27.492346"), MLC_1.c3d.5 = c("-7.886568", "-8.132847", 
"-8.310396", "-8.435491", "-8.530880", "-8.623341")), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

这里有一种使用 {tidyr} 包延长旋转时间的方法。

library(dplyr)
library(stringr)
library(tidyr)

dat %>%
  rename("Wave" = "X") %>%
  pivot_longer(-1, names_to = "id", values_to = "val") %>%
  separate(id, c("id", "key"), sep = "(?<=MLC_\d).") %>%
  pivot_wider(names_from = key, values_from = val) %>%
  mutate(across("id", str_replace, "MLC_", "")) %>%
  arrange(id, Wave)

我确定有一种方法可以一步完成,但我还没有弄清楚。如果我解决了,会更新这个答案。

更新

这更整洁,一次性完成旋转:

dat %>%
  rename("Wave" = "X") %>%
  pivot_longer(-1,
               names_to = c("id", ".value"),
               names_pattern = "MLC_(\d.*).(c.*)",
               names_transform = list(id = as.integer)) %>%
  arrange(id, Wave)

这是另一个使用 data.table 的解决方案(假设您的数据被命名为 df)。

library(data.table)

setnames(setDT(df), "X", "Wave")

melt(df, 
     id = "Wave", 
     measure = Map(function(x) paste0("MLC_", 1:2, ".", x), c("c3d", paste0("c3d.", 1:35))),
     variable.name = "id")