dcast 条件在 r
dcast with condition in r
> log_df[1:10, ]
tagid happened status
1 03B2ACE7 2016-06-28 18:07:36 open
2 03B2ACE7 2016-06-28 18:36:15 closed
3 03B2ACE7 2016-06-29 07:29:59 open
4 03B2ACE7 2016-06-29 08:06:23 closed
5 03B2ACE7 2016-06-30 16:10:48 open
6 03B2ACE7 2016-06-30 17:23:55 open
7 03B2ACE7 2016-07-01 10:12:06 closed
8 03B2ACE7 2016-07-01 13:39:58 closed
9 03B2ACE7 2016-07-02 10:08:40 open
10 03B2ACE7 2016-07-02 13:33:01 closed
...
以上是我的数据。
我想制作的是:
tagid open closed
1 03B2ACE7 2016-06-28 18:07:36 2016-06-28 18:36:15
2 03B2ACE7 2016-06-29 07:29:59 2016-06-29 08:06:23
3 03B2ACE7 2016-06-30 16:10:48 2016-07-01 10:12:06
...
我试图让它在 reshape2 包中与 dcast 一起工作。
但是,我必须有选择地只去哪里取
"open" 是第一个,只有关闭后才出现,"close" 刚好在打开前出现。
所以从 log_df 开始,第 6 行和第 7 行将被忽略..
我真的被卡住了,不确定我该如何去做..
也许 dcast 不是最好的方法?
请帮忙!非常感谢!
使用dplyr
和tidyr
(来自tidiverse,reshape的进化):
library(dplyr)
library(tidyr)
df %>%
filter((status == 'open' & lag(status, default = "") != 'open') | (status == 'closed' & lead(status, default = "") != "closed")) %>%
mutate(r = ceiling(row_number() / 2)) %>%
spread(status, happened)
#> tagid r closed open
#> 1 03B2ACE7 1 2016-06-28 18:36:15 2016-06-28 18:07:36
#> 2 03B2ACE7 2 2016-06-29 08:06:23 2016-06-29 07:29:59
#> 3 03B2ACE7 3 2016-07-01 13:39:58 2016-06-30 16:10:48
#> 4 03B2ACE7 4 2016-07-02 13:33:01 2016-07-02 10:08:40
它:
- 根据特定条件过滤 data.frame
- 添加一列来存储 'group'
- 将值分布到列(相当于 dcast)
数据:
df <- read.table(text = ' tagid happened status
1 03B2ACE7 "2016-06-28 18:07:36" open
2 03B2ACE7 "2016-06-28 18:36:15" closed
3 03B2ACE7 "2016-06-29 07:29:59" open
4 03B2ACE7 "2016-06-29 08:06:23" closed
5 03B2ACE7 "2016-06-30 16:10:48" open
6 03B2ACE7 "2016-06-30 17:23:55" open
7 03B2ACE7 "2016-07-01 10:12:06" closed
8 03B2ACE7 "2016-07-01 13:39:58" closed
9 03B2ACE7 "2016-07-02 10:08:40" open
10 03B2ACE7 "2016-07-02 13:33:01" closed', h = T)
> log_df[1:10, ]
tagid happened status
1 03B2ACE7 2016-06-28 18:07:36 open
2 03B2ACE7 2016-06-28 18:36:15 closed
3 03B2ACE7 2016-06-29 07:29:59 open
4 03B2ACE7 2016-06-29 08:06:23 closed
5 03B2ACE7 2016-06-30 16:10:48 open
6 03B2ACE7 2016-06-30 17:23:55 open
7 03B2ACE7 2016-07-01 10:12:06 closed
8 03B2ACE7 2016-07-01 13:39:58 closed
9 03B2ACE7 2016-07-02 10:08:40 open
10 03B2ACE7 2016-07-02 13:33:01 closed
...
以上是我的数据。 我想制作的是:
tagid open closed
1 03B2ACE7 2016-06-28 18:07:36 2016-06-28 18:36:15
2 03B2ACE7 2016-06-29 07:29:59 2016-06-29 08:06:23
3 03B2ACE7 2016-06-30 16:10:48 2016-07-01 10:12:06
...
我试图让它在 reshape2 包中与 dcast 一起工作。 但是,我必须有选择地只去哪里取
"open" 是第一个,只有关闭后才出现,"close" 刚好在打开前出现。
所以从 log_df 开始,第 6 行和第 7 行将被忽略..
我真的被卡住了,不确定我该如何去做.. 也许 dcast 不是最好的方法?
请帮忙!非常感谢!
使用dplyr
和tidyr
(来自tidiverse,reshape的进化):
library(dplyr)
library(tidyr)
df %>%
filter((status == 'open' & lag(status, default = "") != 'open') | (status == 'closed' & lead(status, default = "") != "closed")) %>%
mutate(r = ceiling(row_number() / 2)) %>%
spread(status, happened)
#> tagid r closed open
#> 1 03B2ACE7 1 2016-06-28 18:36:15 2016-06-28 18:07:36
#> 2 03B2ACE7 2 2016-06-29 08:06:23 2016-06-29 07:29:59
#> 3 03B2ACE7 3 2016-07-01 13:39:58 2016-06-30 16:10:48
#> 4 03B2ACE7 4 2016-07-02 13:33:01 2016-07-02 10:08:40
它:
- 根据特定条件过滤 data.frame
- 添加一列来存储 'group'
- 将值分布到列(相当于 dcast)
数据:
df <- read.table(text = ' tagid happened status
1 03B2ACE7 "2016-06-28 18:07:36" open
2 03B2ACE7 "2016-06-28 18:36:15" closed
3 03B2ACE7 "2016-06-29 07:29:59" open
4 03B2ACE7 "2016-06-29 08:06:23" closed
5 03B2ACE7 "2016-06-30 16:10:48" open
6 03B2ACE7 "2016-06-30 17:23:55" open
7 03B2ACE7 "2016-07-01 10:12:06" closed
8 03B2ACE7 "2016-07-01 13:39:58" closed
9 03B2ACE7 "2016-07-02 10:08:40" open
10 03B2ACE7 "2016-07-02 13:33:01" closed', h = T)