如何在 R 中压缩带有日期的数据集
How to condense a dataset with dates in R
我正在尝试获取一些数据并清理它以供最终用户查看,但我是 R 的新手,似乎不太清楚如何去做。另外,这是我的第一个 post,所以如果我写这个问题的方式有任何格式或结构问题,请告诉我。
目前的数据情况:
name
date
reason
john
1/1/2022
late
john
1/2/2022
late
john
1/4/2022
absent
betty
1/3/2022
absent
betty
1/5/2022
no call
betty
1/7/2022
no call
kyle
1/3/2022
absent
kyle
1/5/2022
no call
kyle
1/7/2022
no call
我想看看是否有一种方法可以将其压缩,以便对于每个名字,日期和原因都在同一行上。像这样:
name
date1
reason1
date2
reason2
date3
reason3
john
1/1/2022
late
1/2/2022
late
1/4/2022
absent
betty
1/3/2022
absent
1/5/2022
no call
1/7/2022
no call
kyle
1/3/2022
absent
1/5/2022
no call
1/7/2022
no call
或者,我尝试使用 dcast
,但我的代码生成的是数字而不是日期。
new db <- dcast(db, name ~ reason, fun.aggregate = list, value.var = "date")
我想要的:
name
late
absent
no call
john
1/1/2022,1/2/2022
1/4/2022
betty
1/3/2022
1/5/2022,1/7/2022
kyle
1/3/2022
1/5/2022,1/7/2022
我得到的:
name
late
absent
no call
john
c(1620708300,1627236300)
1639328820
numeric(0)
betty
numeric(0)
1612973940
c(1611937080, 1612455480)
kyle
numeric(0)
1639329540
c(1635526800, 1639760400)
编辑:
由于 @Andre Wildberg,我能够使用 as.data.frame(pivot_wider(df, names_from=reason, values_from=date, values_fn=list, values_fill=list("")))
使它离我需要的位置几英寸远,我需要的最后一步是从单元格中删除 c()并能够在这些字段中显示干净的日期。
db<-structure(list(name = c("Debby", "Debby", "Debby",
"Debby", "Robert", "Robert", "Robert",
"Ryan", "Ryan", "Ryan", "Ryan",
"Ryan", "Ryan", "Brandon", "Brandon"
), reason = c("Absent", "Leave Early", "Late", "Leave Early",
"Leave Early", "Leave Early", "Absent", "Absent", "Absent", "Absent",
"Absent", "Leave Early", "Late", "Leave Early", "Leave Early"
), date = c("2021-05-11 04:45:00", "2021-05-15 04:02:00", "2021-07-25
18:05:00",
"2021-09-19 20:01:00", "2021-11-25 01:02:00", "2021-12-08 20:56:00",
"2021-12-16 17:30:00", "2021-10-09 17:00:00", "2021-11-07 17:00:00",
"2021-11-12 17:00:00", "2021-11-28 17:00:00", "2021-12-11 01:31:00",
"2021-12-12 17:07:00", "2021-05-03 23:58:00", "2021-05-15 23:31:00"
)), row.names = c(NA, -15L), class = c("tbl_df", "tbl", "data.frame"
))
如果你想合并观察结果,试试这个
library(tidyr)
as.data.frame(pivot_wider(df, names_from=reason, values_from=date,
values_fn=list, values_fill=list("")))
name
1 Debby
2 Robert
3 Ryan
4 Brandon
Absent
1 2021-05-11 04:45:00
2 2021-12-16 17:30:00
3 2021-10-09 17:00:00, 2021-11-07 17:00:00, 2021-11-12 17:00:00, 2021-11-28 17:00:00
4
Leave Early Late
1 2021-05-15 04:02:00, 2021-09-19 20:01:00 2021-07-25 18:05:00
2 2021-11-25 01:02:00, 2021-12-08 20:56:00
3 2021-12-11 01:31:00 2021-12-12 17:07:00
4 2021-05-03 23:58:00, 2021-05-15 23:31:00
数据
df <- structure(list(name = c("Debby", "Debby", "Debby", "Debby", "Robert",
"Robert", "Robert", "Ryan", "Ryan", "Ryan", "Ryan", "Ryan", "Ryan",
"Brandon", "Brandon"), reason = c("Absent", "Leave Early", "Late",
"Leave Early", "Leave Early", "Leave Early", "Absent", "Absent",
"Absent", "Absent", "Absent", "Leave Early", "Late", "Leave Early",
"Leave Early"), date = c("2021-05-11 04:45:00", "2021-05-15 04:02:00",
"2021-07-25 18:05:00", "2021-09-19 20:01:00", "2021-11-25 01:02:00",
"2021-12-08 20:56:00", "2021-12-16 17:30:00", "2021-10-09 17:00:00",
"2021-11-07 17:00:00", "2021-11-12 17:00:00", "2021-11-28 17:00:00",
"2021-12-11 01:31:00", "2021-12-12 17:07:00", "2021-05-03 23:58:00",
"2021-05-15 23:31:00")), row.names = c(NA, -15L), class = c("tbl_df",
"tbl", "data.frame"))
我正在尝试获取一些数据并清理它以供最终用户查看,但我是 R 的新手,似乎不太清楚如何去做。另外,这是我的第一个 post,所以如果我写这个问题的方式有任何格式或结构问题,请告诉我。
目前的数据情况:
name | date | reason |
---|---|---|
john | 1/1/2022 | late |
john | 1/2/2022 | late |
john | 1/4/2022 | absent |
betty | 1/3/2022 | absent |
betty | 1/5/2022 | no call |
betty | 1/7/2022 | no call |
kyle | 1/3/2022 | absent |
kyle | 1/5/2022 | no call |
kyle | 1/7/2022 | no call |
我想看看是否有一种方法可以将其压缩,以便对于每个名字,日期和原因都在同一行上。像这样:
name | date1 | reason1 | date2 | reason2 | date3 | reason3 |
---|---|---|---|---|---|---|
john | 1/1/2022 | late | 1/2/2022 | late | 1/4/2022 | absent |
betty | 1/3/2022 | absent | 1/5/2022 | no call | 1/7/2022 | no call |
kyle | 1/3/2022 | absent | 1/5/2022 | no call | 1/7/2022 | no call |
或者,我尝试使用 dcast
,但我的代码生成的是数字而不是日期。
new db <- dcast(db, name ~ reason, fun.aggregate = list, value.var = "date")
我想要的:
name | late | absent | no call |
---|---|---|---|
john | 1/1/2022,1/2/2022 | 1/4/2022 | |
betty | 1/3/2022 | 1/5/2022,1/7/2022 | |
kyle | 1/3/2022 | 1/5/2022,1/7/2022 |
我得到的:
name | late | absent | no call |
---|---|---|---|
john | c(1620708300,1627236300) | 1639328820 | numeric(0) |
betty | numeric(0) | 1612973940 | c(1611937080, 1612455480) |
kyle | numeric(0) | 1639329540 | c(1635526800, 1639760400) |
编辑:
由于 @Andre Wildberg,我能够使用 as.data.frame(pivot_wider(df, names_from=reason, values_from=date, values_fn=list, values_fill=list("")))
使它离我需要的位置几英寸远,我需要的最后一步是从单元格中删除 c()并能够在这些字段中显示干净的日期。
db<-structure(list(name = c("Debby", "Debby", "Debby",
"Debby", "Robert", "Robert", "Robert",
"Ryan", "Ryan", "Ryan", "Ryan",
"Ryan", "Ryan", "Brandon", "Brandon"
), reason = c("Absent", "Leave Early", "Late", "Leave Early",
"Leave Early", "Leave Early", "Absent", "Absent", "Absent", "Absent",
"Absent", "Leave Early", "Late", "Leave Early", "Leave Early"
), date = c("2021-05-11 04:45:00", "2021-05-15 04:02:00", "2021-07-25
18:05:00",
"2021-09-19 20:01:00", "2021-11-25 01:02:00", "2021-12-08 20:56:00",
"2021-12-16 17:30:00", "2021-10-09 17:00:00", "2021-11-07 17:00:00",
"2021-11-12 17:00:00", "2021-11-28 17:00:00", "2021-12-11 01:31:00",
"2021-12-12 17:07:00", "2021-05-03 23:58:00", "2021-05-15 23:31:00"
)), row.names = c(NA, -15L), class = c("tbl_df", "tbl", "data.frame"
))
如果你想合并观察结果,试试这个
library(tidyr)
as.data.frame(pivot_wider(df, names_from=reason, values_from=date,
values_fn=list, values_fill=list("")))
name
1 Debby
2 Robert
3 Ryan
4 Brandon
Absent
1 2021-05-11 04:45:00
2 2021-12-16 17:30:00
3 2021-10-09 17:00:00, 2021-11-07 17:00:00, 2021-11-12 17:00:00, 2021-11-28 17:00:00
4
Leave Early Late
1 2021-05-15 04:02:00, 2021-09-19 20:01:00 2021-07-25 18:05:00
2 2021-11-25 01:02:00, 2021-12-08 20:56:00
3 2021-12-11 01:31:00 2021-12-12 17:07:00
4 2021-05-03 23:58:00, 2021-05-15 23:31:00
数据
df <- structure(list(name = c("Debby", "Debby", "Debby", "Debby", "Robert",
"Robert", "Robert", "Ryan", "Ryan", "Ryan", "Ryan", "Ryan", "Ryan",
"Brandon", "Brandon"), reason = c("Absent", "Leave Early", "Late",
"Leave Early", "Leave Early", "Leave Early", "Absent", "Absent",
"Absent", "Absent", "Absent", "Leave Early", "Late", "Leave Early",
"Leave Early"), date = c("2021-05-11 04:45:00", "2021-05-15 04:02:00",
"2021-07-25 18:05:00", "2021-09-19 20:01:00", "2021-11-25 01:02:00",
"2021-12-08 20:56:00", "2021-12-16 17:30:00", "2021-10-09 17:00:00",
"2021-11-07 17:00:00", "2021-11-12 17:00:00", "2021-11-28 17:00:00",
"2021-12-11 01:31:00", "2021-12-12 17:07:00", "2021-05-03 23:58:00",
"2021-05-15 23:31:00")), row.names = c(NA, -15L), class = c("tbl_df",
"tbl", "data.frame"))