按组按行计算不同的值
Count the distinct value by row by group
我希望在 r 中逐行计算唯一值。逐行的唯一值不应包括空白单元格。
例如,
df<-data.frame(
Group=c("A1","A1","A1","A1","A1","B1","B1","B1"),
Segment=c("A",NA,"A","B","A",NA,"A","B")
)
输入:
+---------+--------+
| Group |Segment |
+---------+--------+
| A1 |A |
| A1 |NA |
| A1 |A |
| A1 |B |
| A1 |A |
| B1 |NA |
| B1 |A |
| B1 |B |
+---------+--------+
我在解决问题时使用了 for 循环,但在大数据集中需要更多时间才能得到结果。
Distinct 列中的预期输出
+---------+--------+----------+
| Group |Segment | distinct |
+---------+--------+----------+
| A1 |A | 1 |
| A1 |NA | 1 |
| A1 |A | 1 |
| A1 |B | 2 |
| A1 |A | 2 |
| B1 |NA | 0 |
| B1 |A | 1 |
| B1 |B | 1 |
+---------+--------+----------+
duplicated
对此很有用,尽管 NA 让它有点棘手:
library(dplyr)
df %>%
group_by(Group) %>%
mutate(distinct = cumsum(!duplicated(Segment) & !is.na(Segment)))
# A tibble: 8 x 3
# Groups: Group [2]
Group Segment distinct
<fct> <fct> <int>
1 A1 A 1
2 A1 NA 1
3 A1 A 1
4 A1 B 2
5 A1 A 2
6 B1 NA 0
7 B1 A 1
8 B1 B 2
我希望在 r 中逐行计算唯一值。逐行的唯一值不应包括空白单元格。 例如,
df<-data.frame(
Group=c("A1","A1","A1","A1","A1","B1","B1","B1"),
Segment=c("A",NA,"A","B","A",NA,"A","B")
)
输入:
+---------+--------+ | Group |Segment | +---------+--------+ | A1 |A | | A1 |NA | | A1 |A | | A1 |B | | A1 |A | | B1 |NA | | B1 |A | | B1 |B | +---------+--------+
我在解决问题时使用了 for 循环,但在大数据集中需要更多时间才能得到结果。
Distinct 列中的预期输出
+---------+--------+----------+ | Group |Segment | distinct | +---------+--------+----------+ | A1 |A | 1 | | A1 |NA | 1 | | A1 |A | 1 | | A1 |B | 2 | | A1 |A | 2 | | B1 |NA | 0 | | B1 |A | 1 | | B1 |B | 1 | +---------+--------+----------+
duplicated
对此很有用,尽管 NA 让它有点棘手:
library(dplyr)
df %>%
group_by(Group) %>%
mutate(distinct = cumsum(!duplicated(Segment) & !is.na(Segment)))
# A tibble: 8 x 3 # Groups: Group [2] Group Segment distinct <fct> <fct> <int> 1 A1 A 1 2 A1 NA 1 3 A1 A 1 4 A1 B 2 5 A1 A 2 6 B1 NA 0 7 B1 A 1 8 B1 B 2