使用 Dplyr 的 "group_by" 创建组,然后使用 Stringr 或 Set 操作来查找组之间的差异
Creating Groups with Dplyr's "group_by" then Using Stringr or Set Operations to Find Differences Between Groups
如果可能,我想使用 dplyr 和 stringr,或者至少留在 Tidyverse 中以实现以下目标:
我需要按 CaseWorker 和 Client 对数据进行分组并比较 "Task" 和 "Task2" 以找到 "Task2" 中不在 "Task" 中的所有类别,以及以及 "Task2" 类别的相关总时间。
"Task" 可以有不在 "Task2" 中的类别,所以我只对查找 "Task2" 中不在 "Task" 中的类别感兴趣。如果能够创建新列来显示 "Task2" 中而不是 "Task" 中的特定条目以及关联的 "Time" 值,那就太好了。
最终结果应该为客户 Chris 显示四个新列,一个用于 "Iron shirt",一列用于关联的 "Time" of 45,一列用于 "Do homework" 和一列"Time" of 21。客户 Eric 将有两列新列,一列用于 "Iron Shirt",另一列用于关联时间 12。
CaseWorker<-c("John","John","John","John","John","John","John","John",
"John","Kim","Kim")
Client<-c("Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Eric","Eric")
Task<-c("Feed cat","Feed cat","Feed cat","Make dinner","Make dinner","Make dinner","Buy groceries","Buy groceries","Buy groceries","Do homework","Do homework")
Task2<-c("Feed cat","Iron shirt","Iron shirt","Do homework","Do homework","Do homework","Make dinner","Feed cat","Feed cat","Do homework","Iron shirt")
Time<-c(20,34,11,10,5,6,55,30,20,10,12)
Df<-data.frame(CaseWorker,Client,Task,Task2,Time)
我们可以试试
library(dplyr)
library(tidyr)
Df %>%
group_by(CaseWorker, Client) %>%
filter(Task2 %in% setdiff(Task2, Task)) %>%
group_by(Task2, add=TRUE) %>%
summarise(Time = sum(Time)) %>%
spread(Task2, Time)
# CaseWorker Client `Do homework` `Iron shirt`
#* <fctr> <fctr> <dbl> <dbl>
#1 John Chris 21 45
#2 Kim Eric NA 12
如果可能,我想使用 dplyr 和 stringr,或者至少留在 Tidyverse 中以实现以下目标:
我需要按 CaseWorker 和 Client 对数据进行分组并比较 "Task" 和 "Task2" 以找到 "Task2" 中不在 "Task" 中的所有类别,以及以及 "Task2" 类别的相关总时间。
"Task" 可以有不在 "Task2" 中的类别,所以我只对查找 "Task2" 中不在 "Task" 中的类别感兴趣。如果能够创建新列来显示 "Task2" 中而不是 "Task" 中的特定条目以及关联的 "Time" 值,那就太好了。
最终结果应该为客户 Chris 显示四个新列,一个用于 "Iron shirt",一列用于关联的 "Time" of 45,一列用于 "Do homework" 和一列"Time" of 21。客户 Eric 将有两列新列,一列用于 "Iron Shirt",另一列用于关联时间 12。
CaseWorker<-c("John","John","John","John","John","John","John","John",
"John","Kim","Kim")
Client<-c("Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Chris","Eric","Eric")
Task<-c("Feed cat","Feed cat","Feed cat","Make dinner","Make dinner","Make dinner","Buy groceries","Buy groceries","Buy groceries","Do homework","Do homework")
Task2<-c("Feed cat","Iron shirt","Iron shirt","Do homework","Do homework","Do homework","Make dinner","Feed cat","Feed cat","Do homework","Iron shirt")
Time<-c(20,34,11,10,5,6,55,30,20,10,12)
Df<-data.frame(CaseWorker,Client,Task,Task2,Time)
我们可以试试
library(dplyr)
library(tidyr)
Df %>%
group_by(CaseWorker, Client) %>%
filter(Task2 %in% setdiff(Task2, Task)) %>%
group_by(Task2, add=TRUE) %>%
summarise(Time = sum(Time)) %>%
spread(Task2, Time)
# CaseWorker Client `Do homework` `Iron shirt`
#* <fctr> <fctr> <dbl> <dbl>
#1 John Chris 21 45
#2 Kim Eric NA 12