如何在 R 中指定具有特定条件的 csv 文件中的单元格
How to specify cells in csv file with certain conditions in R
我有一个 csv 格式的事件日志,想在 table 中指定具有特定条件的单元格。 table 如下所示。
Case.ID | Activity | Timestamp | Resource
----------------------------------------------
0 |Take order| 00:12:04 | Waiter
----------------------------------------------
0 |Take order| 00:18:02 |
----------------------------------------------
1 |Bring food| 00:47:23 | Cook helper
----------------------------------------------
1 |Bring food| 00:52:41 |
activity 的开头在资源列中有值,但 activity 的结尾在其中有 none。
我想制作一个列持续时间,这是从结束时间戳到开始时间戳的差异,但不确定如何处理它。
如果你强制你的因素成为一个字符变量,你可以尝试:
library(tidyverse)
df = data.frame(Case.ID = c(0,0,1,1),
Activity = c(rep("Take order",2),rep("Bring food",2)),
Timestamp = c("00:12:04","00:18:02","00:47:23","00:52:41"),
Resource = c("Waiter","","Cook helper",""),stringsAsFactors = F)
df %>% group_by(Case.ID) %>% mutate(timing = as.difftime(Timestamp[length(Timestamp)])-as.difftime(Timestamp[1]))
不是很优雅,因为它回收了行上的值,但我不确定你的目标是什么
使用@timfaber 给出的数据框做:
aggregate(x = list(duration = as.POSIXct(df$Timestamp,format = "%H:%M:%S")),
by = list(Case.ID = df$Case.ID),
FUN = diff)
这给出:
Case.ID duration
1 0 5.966667
2 1 5.300000
我有一个 csv 格式的事件日志,想在 table 中指定具有特定条件的单元格。 table 如下所示。
Case.ID | Activity | Timestamp | Resource
----------------------------------------------
0 |Take order| 00:12:04 | Waiter
----------------------------------------------
0 |Take order| 00:18:02 |
----------------------------------------------
1 |Bring food| 00:47:23 | Cook helper
----------------------------------------------
1 |Bring food| 00:52:41 |
activity 的开头在资源列中有值,但 activity 的结尾在其中有 none。
我想制作一个列持续时间,这是从结束时间戳到开始时间戳的差异,但不确定如何处理它。
如果你强制你的因素成为一个字符变量,你可以尝试:
library(tidyverse)
df = data.frame(Case.ID = c(0,0,1,1),
Activity = c(rep("Take order",2),rep("Bring food",2)),
Timestamp = c("00:12:04","00:18:02","00:47:23","00:52:41"),
Resource = c("Waiter","","Cook helper",""),stringsAsFactors = F)
df %>% group_by(Case.ID) %>% mutate(timing = as.difftime(Timestamp[length(Timestamp)])-as.difftime(Timestamp[1]))
不是很优雅,因为它回收了行上的值,但我不确定你的目标是什么
使用@timfaber 给出的数据框做:
aggregate(x = list(duration = as.POSIXct(df$Timestamp,format = "%H:%M:%S")),
by = list(Case.ID = df$Case.ID),
FUN = diff)
这给出:
Case.ID duration
1 0 5.966667
2 1 5.300000