按组按行计算不同的值

Count the distinct value by row by group

我希望在 r 中逐行计算唯一值。逐行的唯一值不应包括空白单元格。 例如,

df<-data.frame(
  Group=c("A1","A1","A1","A1","A1","B1","B1","B1"),
  Segment=c("A",NA,"A","B","A",NA,"A","B")
)

输入:

 
+---------+--------+
| Group   |Segment |
+---------+--------+
| A1      |A       |
| A1      |NA      |
| A1      |A       |
| A1      |B       |
| A1      |A       |
| B1      |NA      |
| B1      |A       |
| B1      |B       |
+---------+--------+

我在解决问题时使用了 for 循环,但在大数据集中需要更多时间才能得到结果。

Distinct 列中的预期输出

 
+---------+--------+----------+
| Group   |Segment | distinct |
+---------+--------+----------+
| A1      |A       |    1     |
| A1      |NA      |    1     |
| A1      |A       |    1     |
| A1      |B       |    2     |
| A1      |A       |    2     |
| B1      |NA      |    0     |
| B1      |A       |    1     |
| B1      |B       |    1     |
+---------+--------+----------+

duplicated 对此很有用,尽管 NA 让它有点棘手:

library(dplyr)
df %>% 
  group_by(Group) %>% 
  mutate(distinct = cumsum(!duplicated(Segment) & !is.na(Segment)))
# A tibble: 8 x 3
# Groups:   Group [2]
  Group Segment distinct
  <fct> <fct>      <int>
1 A1    A              1
2 A1    NA             1
3 A1    A              1
4 A1    B              2
5 A1    A              2
6 B1    NA             0
7 B1    A              1
8 B1    B              2