需要帮助重塑 R 数据集
Need help reshaping an R dataset
我的数据集目前看起来像这样:
id date00 var1_00 var2_00 date01 var1_01 var2_01
1 1/1/2019 1 2 1/1/2020 3 4
2 2/2/2019 1 2 2/2/2020 3 4
3 3/3/2019 1 2 3/3/2020 3 4
table的代码:
structure(list(id = c(1, 2, 3), date00 = structure(c(1546300800,
1549065600, 1551571200), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
var1_00 = c(1, 1, 1), var2_00 = c(2, 2, 2), date01 = structure(c(1577836800,
1580601600, 1583193600), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
var1_01 = c(3, 3, 3), var2_01 = c(4, 4, 4)), row.names = c(NA,
-3L), class = c("tbl_df", "tbl", "data.frame"))
如何重塑它,使其看起来像这样:
id date var1_00 var2_00 var1_01 var2_01
1 1/1/2019 1 2 NA NA
2 2/2/2019 1 2 NA NA
3 3/3/2019 1 2 NA NA
1 1/1/2020 NA NA 3 4
2 1/1/2020 NA NA 3 4
3 1/1/2020 NA NA 3 4
谢谢!
我尝试了一下,得出了这个解决方案。请告诉我。
library(dplyr)
df1 <- df %>%
mutate(date=date00,
var1_01=NA,
var2_01=NA) %>%
select(id, date, var1_00, var2_00, var1_01, var2_01)
df2 <- df %>%
mutate(date=date01,
var1_00=NA,
var2_00=NA) %>%
select(id, date, var1_00, var2_00, var1_01, var2_01)
df_new <- rbind(df1, df2)
这是一个使用 rbindlist
的 data.table
选项
setDT(df)
dt1 <- setnames(df[,.SD,.SDcols = grep("^id|00$",names(df))],"date00","date")
dt2 <- setnames(df[,.SD,.SDcols = grep("^id|01$",names(df))],"date01","date")
out <- rbindlist(list(dt1,dt2),fill = TRUE)
或
dt <- as.data.table(df)
out <- rbindlist(
lapply(
split.default(dt[,-1],gsub(".*(\d+$)","\1",names(dt)[-1])),
function(x) cbind(dt[,1],setnames(x,1,"date"))),
fill = TRUE
)
这样
> out
id date var1_00 var2_00 var1_01 var2_01
1: 1 2019-01-01 1 2 NA NA
2: 2 2019-02-02 1 2 NA NA
3: 3 2019-03-03 1 2 NA NA
4: 1 2020-01-01 NA NA 3 4
5: 2 2020-02-02 NA NA 3 4
6: 3 2020-03-03 NA NA 3 4
我的数据集目前看起来像这样:
id date00 var1_00 var2_00 date01 var1_01 var2_01
1 1/1/2019 1 2 1/1/2020 3 4
2 2/2/2019 1 2 2/2/2020 3 4
3 3/3/2019 1 2 3/3/2020 3 4
table的代码:
structure(list(id = c(1, 2, 3), date00 = structure(c(1546300800,
1549065600, 1551571200), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
var1_00 = c(1, 1, 1), var2_00 = c(2, 2, 2), date01 = structure(c(1577836800,
1580601600, 1583193600), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
var1_01 = c(3, 3, 3), var2_01 = c(4, 4, 4)), row.names = c(NA,
-3L), class = c("tbl_df", "tbl", "data.frame"))
如何重塑它,使其看起来像这样:
id date var1_00 var2_00 var1_01 var2_01
1 1/1/2019 1 2 NA NA
2 2/2/2019 1 2 NA NA
3 3/3/2019 1 2 NA NA
1 1/1/2020 NA NA 3 4
2 1/1/2020 NA NA 3 4
3 1/1/2020 NA NA 3 4
谢谢!
我尝试了一下,得出了这个解决方案。请告诉我。
library(dplyr)
df1 <- df %>%
mutate(date=date00,
var1_01=NA,
var2_01=NA) %>%
select(id, date, var1_00, var2_00, var1_01, var2_01)
df2 <- df %>%
mutate(date=date01,
var1_00=NA,
var2_00=NA) %>%
select(id, date, var1_00, var2_00, var1_01, var2_01)
df_new <- rbind(df1, df2)
这是一个使用 rbindlist
data.table
选项
setDT(df)
dt1 <- setnames(df[,.SD,.SDcols = grep("^id|00$",names(df))],"date00","date")
dt2 <- setnames(df[,.SD,.SDcols = grep("^id|01$",names(df))],"date01","date")
out <- rbindlist(list(dt1,dt2),fill = TRUE)
或
dt <- as.data.table(df)
out <- rbindlist(
lapply(
split.default(dt[,-1],gsub(".*(\d+$)","\1",names(dt)[-1])),
function(x) cbind(dt[,1],setnames(x,1,"date"))),
fill = TRUE
)
这样
> out
id date var1_00 var2_00 var1_01 var2_01
1: 1 2019-01-01 1 2 NA NA
2: 2 2019-02-02 1 2 NA NA
3: 3 2019-03-03 1 2 NA NA
4: 1 2020-01-01 NA NA 3 4
5: 2 2020-02-02 NA NA 3 4
6: 3 2020-03-03 NA NA 3 4