我怎样才能获得不同时符合两个标准的唯一 ID 的数量（例如，在不同的行中）？

Question

我有一个 dataframe 每个人有多个 rows (id)。我想获得 unique 个人中曾在 A 和 B 中记录过 1 的 count，但不一定在同一天（A 和 B 可以在不同的 rows 中接收 1，例如日期）。

id <- as.character(c(108, 108, 111, 111, 111, 111, 153, 153, 153, 153, 153, 153))
date <- as.POSIXct(c("2014-03-12 08:44:18 UTC", "2015-09-16 02:56:00 UTC",  
"2015-10-24 08:09:11 UTC", "2016-12-11 17:17:00 UTC", "2017-08-06 18:26:00 UTC", 
"2018-01-29 00:00:00 UTC", "2014-04-17 08:40:10 UTC", "2015-09-16 02:56:00 UTC", 
"2015-11-12 13:15:00 UTC", "2016-12-16 17:10:09 UTC", "2017-08-10 04:11:00 UTC", 
"2018-01-29 00:00:00 UTC"))
A <- c(1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1)
B <- c(0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0)
df <- data.frame(id, date, A, B)  
df

    id                date A B
1  108 2014-03-12 08:44:18 1 0
2  108 2015-09-16 02:56:00 1 0
3  111 2015-10-24 08:09:11 1 0
4  111 2016-12-11 17:17:00 1 0
5  111 2017-08-06 18:26:00 1 0
6  111 2018-01-29 00:00:00 0 1
7  153 2014-04-17 08:40:10 0 1
8  153 2015-09-16 02:56:00 0 1
9  153 2015-11-12 13:15:00 1 0
10 153 2016-12-16 17:10:09 1 0
11 153 2017-08-10 04:11:00 1 0
12 153 2018-01-29 00:00:00 1 0

我正在使用函数 unique:

> length(unique(df$patid[which(df$A== 1 & df$B==1)]))
[1] 0

但我没有得到我所期望的，一旦两个人在不同的时刻得到 1 和 A 和 B:

[1] 2

我怎样才能得到我想要的正确计数？谢谢

Answer 1

简单的方法是检查一组中 A 和 B 的最大值。例如以下方式：

df %>%
  group_by(id) %>%
  summarize(Count = max(A) == 1 & max(B) == 1) %>%
  summarize(sum(Count))

这导致：

# A tibble: 1 x 1
  `sum(Count)`
         <int>
1            2

Answer 2

一种略有不同的 dplyr 方法，其中 returns 一个整数。这利用 any，如果任何日期符合条件，则 returns TRUE。

df %>% 
  group_by(id) %>%
  summarize(AB = any(A == 1) & any(B == 1)) %>%
  pull(AB) %>%
  sum()

#----
[1] 2

我怎样才能获得不同时符合两个标准的唯一 ID 的数量（例如，在不同的行中）？

How can I obtain the number of unique ids that fits into two criteria non simultaneously (e.g. in different rows)?

r

unique

count

dplyr