如何根据具有空白单元格的另一列的条件语句添加一列的值?
How to add values of one column based on conditional statement of another column that has blank cells?
我正在尝试根据具有空白值的列的条件语句对数据进行子集化,这意味着该员工多次登录工作订单。示例数据集如下所示:
employee_name <- c("Person A","Person A","Person A","Person A","Person A", "Person B","Person B","Person B")
work_order <- c("WO001","WO001","WO001","WO002","WO003","WO001","WO003", "WO003")
num_of_points <- c(40,"","",64,25,20,68,"")
time <- c(10, 30, 15, 20, 25, 5, 15, 30)
final_summary <- data.frame(employee_name,work_order,num_of_points, time)
View(final_summary)
Input
基本上,我想通过选择点数 > 30 的所有行来总结点数和时间,然后按员工姓名和工单分组,这应该 return 这样:
Output
我可以正确执行汇总功能,但是当我执行初始子集时,它排除了 num_of_points 的空白行,因此不会计算所有相邻时间(以分钟为单位)值。这是有道理的,因为 subset(num_of_points > 30) 只能找到大于 30 的任何东西。我如何调整它以包括空白行,以便我可以成功地过滤数据以便准确计算时间总和,分组依据唯一的工单和员工姓名?
将num_of_points
转换为numeric
class,按'employee_name'、'work_order'分组,得到[=20=的sum
] 大于 30 且 'time' 的 sum
,则 filter
出 'num_of_points' 为 0
的行
library(dplyr)
final_summary %>%
mutate(num_of_points = as.numeric(num_of_points)) %>%
group_by(employee_name, work_order) %>%
summarise(num_of_points = sum(num_of_points[num_of_points> 30],
na.rm = TRUE), time = sum(time)) %>%
filter(num_of_points > 0)
# A tibble: 3 x 4
# Groups: employee_name [2]
# employee_name work_order num_of_points time
# <chr> <chr> <dbl> <dbl>
#1 Person A WO001 40 55
#2 Person A WO002 64 20
#3 Person B WO003 68 45
在 base R 中你会做:
aggregate(.~employee_name + work_order, type.convert(final_summary), sum, subset = num_of_points>30)
employee_name work_order num_of_points time
1 Person A WO001 40 10
2 Person A WO002 64 20
3 Person B WO003 68 15
您可以 aggregate
num_of_points
和 time
分开 merge
结果。
merge(aggregate(num_of_points~employee_name + work_order, final_summary,
sum, subset = num_of_points>30),
aggregate(time~employee_name + work_order, final_summary, sum))
# employee_name work_order num_of_points time
#1 Person A WO001 40 55
#2 Person A WO002 64 20
#3 Person B WO003 68 45
我正在尝试根据具有空白值的列的条件语句对数据进行子集化,这意味着该员工多次登录工作订单。示例数据集如下所示:
employee_name <- c("Person A","Person A","Person A","Person A","Person A", "Person B","Person B","Person B")
work_order <- c("WO001","WO001","WO001","WO002","WO003","WO001","WO003", "WO003")
num_of_points <- c(40,"","",64,25,20,68,"")
time <- c(10, 30, 15, 20, 25, 5, 15, 30)
final_summary <- data.frame(employee_name,work_order,num_of_points, time)
View(final_summary)
Input
基本上,我想通过选择点数 > 30 的所有行来总结点数和时间,然后按员工姓名和工单分组,这应该 return 这样:
Output
我可以正确执行汇总功能,但是当我执行初始子集时,它排除了 num_of_points 的空白行,因此不会计算所有相邻时间(以分钟为单位)值。这是有道理的,因为 subset(num_of_points > 30) 只能找到大于 30 的任何东西。我如何调整它以包括空白行,以便我可以成功地过滤数据以便准确计算时间总和,分组依据唯一的工单和员工姓名?
将num_of_points
转换为numeric
class,按'employee_name'、'work_order'分组,得到[=20=的sum
] 大于 30 且 'time' 的 sum
,则 filter
出 'num_of_points' 为 0
library(dplyr)
final_summary %>%
mutate(num_of_points = as.numeric(num_of_points)) %>%
group_by(employee_name, work_order) %>%
summarise(num_of_points = sum(num_of_points[num_of_points> 30],
na.rm = TRUE), time = sum(time)) %>%
filter(num_of_points > 0)
# A tibble: 3 x 4
# Groups: employee_name [2]
# employee_name work_order num_of_points time
# <chr> <chr> <dbl> <dbl>
#1 Person A WO001 40 55
#2 Person A WO002 64 20
#3 Person B WO003 68 45
在 base R 中你会做:
aggregate(.~employee_name + work_order, type.convert(final_summary), sum, subset = num_of_points>30)
employee_name work_order num_of_points time
1 Person A WO001 40 10
2 Person A WO002 64 20
3 Person B WO003 68 15
您可以 aggregate
num_of_points
和 time
分开 merge
结果。
merge(aggregate(num_of_points~employee_name + work_order, final_summary,
sum, subset = num_of_points>30),
aggregate(time~employee_name + work_order, final_summary, sum))
# employee_name work_order num_of_points time
#1 Person A WO001 40 55
#2 Person A WO002 64 20
#3 Person B WO003 68 45