运行 R 中条件匹配时字符变量的计数
Running count of a character variable when a condition matches in R
我有一个数据框,其中前两列是可以选择的选项,它有第三列,其中记录了选择。我正在尝试为何时匹配第一个列选项时添加运行计数。
示例数据框:
df<-data.frame(box.1=c("A","A","B","C","A","B","A"),
box.2=c("B","B","A","A","C","C","C"),
selection=c("A","B","B","A","C","B","A"))
所需的结果数据框:
resulting_df<-data.frame(box.1=c("A","A","B","C","A","B","A"),
box.2=c("B","B","A","A","C","C","C"),
selection=c("A","B","B","A","C","B","A"),
running.count.box.1=c(0,1,0,0,1,1,1))
解决方案尝试
到目前为止,我已尝试使用 group_by
、mutate
和 cumsum
来创建新变量。
df %>%
group_by(box.1) %>%
mutate(running.count=cumsum(!duplicated(box.1==selection))-1)
上面的代码没有 return 实际的 运行 计数并且将 group_by
更改为 selection
或两者的组合也没有得到预期的结果。
不推荐总结数据,因为数据框是要与对它们进行类似操作的其他人合并,所以应该保持相同的框架。
有没有办法使用 dplyr
在这种情况下添加 运行 计数?
谢谢。
编辑:错别字。
library(dplyr)
df %>%
group_by(box.a) %>%
mutate(count = pmax(0, lag(cumsum(selection == box.a)), na.rm = TRUE)) %>%
ungroup()
## A tibble: 7 x 4
# box.a box.b selection count
# <fct> <fct> <fct> <dbl>
#1 A B A 0
#2 A B B 1
#3 B A B 0
#4 C A A 0
#5 A C C 1
#6 B C B 1
#7 A C A 1
transform(df,run = c(0,sapply(2:nrow(df),function(x)box.a[x]%in%box.a[1:(x-1)])))
box.a box.b selection run
1 A B A 0
2 A B B 1
3 B A B 0
4 C A A 0
5 A C C 1
6 B C B 1
7 A C A 1
我有一个数据框,其中前两列是可以选择的选项,它有第三列,其中记录了选择。我正在尝试为何时匹配第一个列选项时添加运行计数。
示例数据框:
df<-data.frame(box.1=c("A","A","B","C","A","B","A"),
box.2=c("B","B","A","A","C","C","C"),
selection=c("A","B","B","A","C","B","A"))
所需的结果数据框:
resulting_df<-data.frame(box.1=c("A","A","B","C","A","B","A"),
box.2=c("B","B","A","A","C","C","C"),
selection=c("A","B","B","A","C","B","A"),
running.count.box.1=c(0,1,0,0,1,1,1))
解决方案尝试
到目前为止,我已尝试使用 group_by
、mutate
和 cumsum
来创建新变量。
df %>%
group_by(box.1) %>%
mutate(running.count=cumsum(!duplicated(box.1==selection))-1)
上面的代码没有 return 实际的 运行 计数并且将 group_by
更改为 selection
或两者的组合也没有得到预期的结果。
不推荐总结数据,因为数据框是要与对它们进行类似操作的其他人合并,所以应该保持相同的框架。
有没有办法使用 dplyr
在这种情况下添加 运行 计数?
谢谢。
编辑:错别字。
library(dplyr)
df %>%
group_by(box.a) %>%
mutate(count = pmax(0, lag(cumsum(selection == box.a)), na.rm = TRUE)) %>%
ungroup()
## A tibble: 7 x 4
# box.a box.b selection count
# <fct> <fct> <fct> <dbl>
#1 A B A 0
#2 A B B 1
#3 B A B 0
#4 C A A 0
#5 A C C 1
#6 B C B 1
#7 A C A 1
transform(df,run = c(0,sapply(2:nrow(df),function(x)box.a[x]%in%box.a[1:(x-1)])))
box.a box.b selection run
1 A B A 0
2 A B B 1
3 B A B 0
4 C A A 0
5 A C C 1
6 B C B 1
7 A C A 1