重组具有多种数据类型的大型数据框
restructure large data frame with multiple data types
我正在努力使我的数据(xlsx 文件)具有正确的形状。我的原始数据库如下:
patient when age weight height watchID dateFrom
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dttm>
1 T01 pre 82 83 174 2788 2017-07-24
2 T02 pre 81 80 166 7309 2017-07-22
3 T02 post 67 91 163 7309 2017-10-26
4 T03 pre 68 91 172 5066 2017-07-26
5 T03 post 68 91 172 7220 2017-10-24
我想获得一个宽数据库,其中只有一个基于“时间”列的患者 ID。但是当我尝试重塑它时,我终于用“dcast”函数做到了这一点:
patient age_post age_pre weight_post weight_pre height_post height_pre
<chr> <int> <int> <int> <int> <int> <int>
1 T01 0 1 0 1 0 1
2 T02 1 1 1 1 1 1
3 T03 1 1 1 1 1 1
4 T04 0 1 0 1 0 1
5 T05 1 0 1 0 1 0
它以某种方式将所有变量更改为 1 和 0。如何获得具有不同变量类型的类似数据库,并将“pre”和“post”附加到原始列?
这是我的代码(“HW”是上面提到的原始数据集):
mdata <- melt(HW, id=c("patient","when"))
mdata$value <- as.numeric(as.character(mdata$value)) #I added this line to convert the column to numeric but it doesn't help
mdata2 <- dcast(mdata, patient~variable+when)
我也尝试过:
mdata <- melt(HW, id=c("patient","when"))
mdata3 <- reshape(mdata, idvar='patient', timevar='when', direction='wide')
但后来我明白了:
patient variable.pre value.pre variable.post value.post
<chr> <fct> <chr> <fct> <chr>
1 T01 age 82 NA NA
2 T02 age 81 age 67
3 T03 age 68 age 68
4 T04 age 81 NA NA
5 T05 NA NA age 87
没有其他变量。
提前致谢。
这是你想要的吗?
library(tidyr)
df <- tibble(patient = c("T01","T02","T02","T03","T03"),
when = c("pre","pre","post","pre","post"),
age = c(82,81,67,68,68),
weight = c(83,80,91,91,91),
height = c(174,166,163,172,172),
watchid = c(2788,7309,7309,5066,7220),
datefrom = c("2017-07-24","2017-07-22","2017-10-26",
"2017-07-26","2017-10-24"))
df %>%
pivot_wider(names_from = when,
values_from = c(age,weight,height,watchid,datefrom))
A tibble: 3 x 11
patient age_pre age_post weight_pre weight_post height_pre height_post watchid_pre watchid_post
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 T01 82 NA 83 NA 174 NA 2788 NA
2 T02 81 67 80 91 166 163 7309 7309
3 T03 68 68 91 91 172 172 5066 7220
我正在努力使我的数据(xlsx 文件)具有正确的形状。我的原始数据库如下:
patient when age weight height watchID dateFrom
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dttm>
1 T01 pre 82 83 174 2788 2017-07-24
2 T02 pre 81 80 166 7309 2017-07-22
3 T02 post 67 91 163 7309 2017-10-26
4 T03 pre 68 91 172 5066 2017-07-26
5 T03 post 68 91 172 7220 2017-10-24
我想获得一个宽数据库,其中只有一个基于“时间”列的患者 ID。但是当我尝试重塑它时,我终于用“dcast”函数做到了这一点:
patient age_post age_pre weight_post weight_pre height_post height_pre
<chr> <int> <int> <int> <int> <int> <int>
1 T01 0 1 0 1 0 1
2 T02 1 1 1 1 1 1
3 T03 1 1 1 1 1 1
4 T04 0 1 0 1 0 1
5 T05 1 0 1 0 1 0
它以某种方式将所有变量更改为 1 和 0。如何获得具有不同变量类型的类似数据库,并将“pre”和“post”附加到原始列?
这是我的代码(“HW”是上面提到的原始数据集):
mdata <- melt(HW, id=c("patient","when"))
mdata$value <- as.numeric(as.character(mdata$value)) #I added this line to convert the column to numeric but it doesn't help
mdata2 <- dcast(mdata, patient~variable+when)
我也尝试过:
mdata <- melt(HW, id=c("patient","when"))
mdata3 <- reshape(mdata, idvar='patient', timevar='when', direction='wide')
但后来我明白了:
patient variable.pre value.pre variable.post value.post
<chr> <fct> <chr> <fct> <chr>
1 T01 age 82 NA NA
2 T02 age 81 age 67
3 T03 age 68 age 68
4 T04 age 81 NA NA
5 T05 NA NA age 87
没有其他变量。
提前致谢。
这是你想要的吗?
library(tidyr)
df <- tibble(patient = c("T01","T02","T02","T03","T03"),
when = c("pre","pre","post","pre","post"),
age = c(82,81,67,68,68),
weight = c(83,80,91,91,91),
height = c(174,166,163,172,172),
watchid = c(2788,7309,7309,5066,7220),
datefrom = c("2017-07-24","2017-07-22","2017-10-26",
"2017-07-26","2017-10-24"))
df %>%
pivot_wider(names_from = when,
values_from = c(age,weight,height,watchid,datefrom))
A tibble: 3 x 11
patient age_pre age_post weight_pre weight_post height_pre height_post watchid_pre watchid_post
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 T01 82 NA 83 NA 174 NA 2788 NA
2 T02 81 67 80 91 166 163 7309 7309
3 T03 68 68 91 91 172 172 5066 7220