不平衡面板数据到长格式
unbalanced panel data to long format
我在多个文件中有一个长度不一的面板数据。一个文件包含多个参与者的观察结果和每个参与者的几个指标,格式如下
X
M_1.ccc
M_1.ccc.1
M_1.ccc.2
M_2.ccc
M_2.ccc.1
M_2.ccc.2
1
XXX.
XXX.
XXX.
XXX.
XXX.
XXX.
2
XXX.
XXX.
XXX.
XXX.
XXX.
XXX.
....
20
XXX.
XXX.
XXX.
XXX.
XXX.
XXX.
21
XXX.
XXX.
XXX.
22
XXX.
XXX.
XXX.
我需要 table 的长格式
Wave
id
ccc
. ccc.1
ccc.2
1
1
XXX.
XXX.
XXX.
2
1
XXX.
XXX.
XXX.
....
20
1
XXX.
XXX.
XXX.
21
1
XXX.
XXX.
XXX.
22
1
XXX.
XXX.
XXX.
1
2
XXX.
XXX.
XXX.
2
2
XXX.
XXX.
XXX.
....
20
2
XXX.
XXX.
XXX.
我正在尝试使用 panelr
程序包和 long_panel
函数:https://jacob-long.com/post/panelr-intro/,但那里的例子不多
这个尝试行不通
> long_panel(data_full, prefix="_", wave=X)
当我输入句点(每个参与者都不同)时,缺少起点...
test<-long_panel(data_full,
prefix="_",
wave=X,
begin=1,
end=2)
嗯,它实际上不是 1,但应该是空的...而且它 returns 错误:
Error in if (ncol(nn) != 2L) stop("failed to guess time-varying variables from their names") :
argument is of length zero
关于如何解决这个问题有什么想法吗?
示例数据:
dat <- structure(list(X = c("1", "2", "3", "4", "5", "6"), MLC_1.c3d = c("12.061268",
"12.166716", "12.292454", "12.439793", "12.608850", "12.799803"
), MLC_1.c3d.1 = c("-1.138404", "-1.275099", "-1.402655", "-1.523949",
"-1.642789", "-1.761063"), MLC_1.c3d.2 = c("9.136170", "9.374666",
"9.601912", "9.818493", "10.023846", "10.217005"), MLC_1.c3d.3 = c("87.739037",
"88.746254", "89.675377", "90.512108", "91.259438", "91.935745"
), MLC_1.c3d.4 = c("25.202179", "25.669239", "26.133680", "26.592773",
"27.045420", "27.492346"), MLC_1.c3d.5 = c("-7.886568", "-8.132847",
"-8.310396", "-8.435491", "-8.530880", "-8.623341")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
这里有一种使用 {tidyr} 包延长旋转时间的方法。
library(dplyr)
library(stringr)
library(tidyr)
dat %>%
rename("Wave" = "X") %>%
pivot_longer(-1, names_to = "id", values_to = "val") %>%
separate(id, c("id", "key"), sep = "(?<=MLC_\d).") %>%
pivot_wider(names_from = key, values_from = val) %>%
mutate(across("id", str_replace, "MLC_", "")) %>%
arrange(id, Wave)
我确定有一种方法可以一步完成,但我还没有弄清楚。如果我解决了,会更新这个答案。
更新
这更整洁,一次性完成旋转:
dat %>%
rename("Wave" = "X") %>%
pivot_longer(-1,
names_to = c("id", ".value"),
names_pattern = "MLC_(\d.*).(c.*)",
names_transform = list(id = as.integer)) %>%
arrange(id, Wave)
这是另一个使用 data.table
的解决方案(假设您的数据被命名为 df
)。
library(data.table)
setnames(setDT(df), "X", "Wave")
melt(df,
id = "Wave",
measure = Map(function(x) paste0("MLC_", 1:2, ".", x), c("c3d", paste0("c3d.", 1:35))),
variable.name = "id")
我在多个文件中有一个长度不一的面板数据。一个文件包含多个参与者的观察结果和每个参与者的几个指标,格式如下
X | M_1.ccc | M_1.ccc.1 | M_1.ccc.2 | M_2.ccc | M_2.ccc.1 | M_2.ccc.2 |
---|---|---|---|---|---|---|
1 | XXX. | XXX. | XXX. | XXX. | XXX. | XXX. |
2 | XXX. | XXX. | XXX. | XXX. | XXX. | XXX. |
.... | ||||||
20 | XXX. | XXX. | XXX. | XXX. | XXX. | XXX. |
21 | XXX. | XXX. | XXX. | |||
22 | XXX. | XXX. | XXX. |
我需要 table 的长格式
Wave | id | ccc | . ccc.1 | ccc.2 |
---|---|---|---|---|
1 | 1 | XXX. | XXX. | XXX. |
2 | 1 | XXX. | XXX. | XXX. |
.... | ||||
20 | 1 | XXX. | XXX. | XXX. |
21 | 1 | XXX. | XXX. | XXX. |
22 | 1 | XXX. | XXX. | XXX. |
1 | 2 | XXX. | XXX. | XXX. |
2 | 2 | XXX. | XXX. | XXX. |
.... | ||||
20 | 2 | XXX. | XXX. | XXX. |
我正在尝试使用 panelr
程序包和 long_panel
函数:https://jacob-long.com/post/panelr-intro/,但那里的例子不多
这个尝试行不通
> long_panel(data_full, prefix="_", wave=X)
当我输入句点(每个参与者都不同)时,缺少起点...
test<-long_panel(data_full,
prefix="_",
wave=X,
begin=1,
end=2)
嗯,它实际上不是 1,但应该是空的...而且它 returns 错误:
Error in if (ncol(nn) != 2L) stop("failed to guess time-varying variables from their names") : argument is of length zero
关于如何解决这个问题有什么想法吗?
示例数据:
dat <- structure(list(X = c("1", "2", "3", "4", "5", "6"), MLC_1.c3d = c("12.061268",
"12.166716", "12.292454", "12.439793", "12.608850", "12.799803"
), MLC_1.c3d.1 = c("-1.138404", "-1.275099", "-1.402655", "-1.523949",
"-1.642789", "-1.761063"), MLC_1.c3d.2 = c("9.136170", "9.374666",
"9.601912", "9.818493", "10.023846", "10.217005"), MLC_1.c3d.3 = c("87.739037",
"88.746254", "89.675377", "90.512108", "91.259438", "91.935745"
), MLC_1.c3d.4 = c("25.202179", "25.669239", "26.133680", "26.592773",
"27.045420", "27.492346"), MLC_1.c3d.5 = c("-7.886568", "-8.132847",
"-8.310396", "-8.435491", "-8.530880", "-8.623341")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
这里有一种使用 {tidyr} 包延长旋转时间的方法。
library(dplyr)
library(stringr)
library(tidyr)
dat %>%
rename("Wave" = "X") %>%
pivot_longer(-1, names_to = "id", values_to = "val") %>%
separate(id, c("id", "key"), sep = "(?<=MLC_\d).") %>%
pivot_wider(names_from = key, values_from = val) %>%
mutate(across("id", str_replace, "MLC_", "")) %>%
arrange(id, Wave)
我确定有一种方法可以一步完成,但我还没有弄清楚。如果我解决了,会更新这个答案。
更新
这更整洁,一次性完成旋转:
dat %>%
rename("Wave" = "X") %>%
pivot_longer(-1,
names_to = c("id", ".value"),
names_pattern = "MLC_(\d.*).(c.*)",
names_transform = list(id = as.integer)) %>%
arrange(id, Wave)
这是另一个使用 data.table
的解决方案(假设您的数据被命名为 df
)。
library(data.table)
setnames(setDT(df), "X", "Wave")
melt(df,
id = "Wave",
measure = Map(function(x) paste0("MLC_", 1:2, ".", x), c("c3d", paste0("c3d.", 1:35))),
variable.name = "id")