计算多列中每组不同变量的数量?
Counting the number of different variables per group over multiple columns?
我有一个数据框,我想计算每组不同观察值的数量,不计算 NA 值。
以下是数据示例:
ID <-c("A", "A", "B", "B", "B", "C")
Act1 <- c("Football", "Swim", "Football", 'Basketball', "Swim", "Tennis")
Act2 <- c("Swim", "Football", "Tennis", 'Swim', "Football", "Swim")
Act3 <- c("NA", "Tennis", "NA", 'Football', "Tennis", "NA")
df <- data.frame(ID,Act1, Act2, Act3)
df
ID Act1 Act2 Act3
1 A Football Swim NA
2 A Swim Football Tennis
3 B Football Tennis NA
4 B Basketball Swim Football
5 B Swim Football Tennis
6 C Tennis Swim NA
正确答案应该是这样的...
ID n
1 A 3
2 B 4
3 C 2
因为 A 有三种不同的 活动(例如足球、游泳、网球),B 有四种(例如足球、游泳、网球、篮球),C 有两种(例如网球)和游泳)
我该怎么做?
假设空值实际上是 NA
值而不是字符串 "NA"
,您可以使用包 dplyr
和 tidyr
来实现您的预期输出
library(dplyr)
library(tidyr)
df %>%
pivot_longer(-ID) %>%
filter(!is.na(value)) %>% # if you have strings "NA" use filter(value != "NA")
group_by(ID) %>%
summarise(n = n_distinct(value))
# A tibble: 3 x 2
# ID n
# <chr> <int>
# 1 A 3
# 2 B 4
# 3 C 2
我有一个数据框,我想计算每组不同观察值的数量,不计算 NA 值。
以下是数据示例:
ID <-c("A", "A", "B", "B", "B", "C")
Act1 <- c("Football", "Swim", "Football", 'Basketball', "Swim", "Tennis")
Act2 <- c("Swim", "Football", "Tennis", 'Swim', "Football", "Swim")
Act3 <- c("NA", "Tennis", "NA", 'Football', "Tennis", "NA")
df <- data.frame(ID,Act1, Act2, Act3)
df
ID Act1 Act2 Act3
1 A Football Swim NA
2 A Swim Football Tennis
3 B Football Tennis NA
4 B Basketball Swim Football
5 B Swim Football Tennis
6 C Tennis Swim NA
正确答案应该是这样的...
ID n
1 A 3
2 B 4
3 C 2
因为 A 有三种不同的 活动(例如足球、游泳、网球),B 有四种(例如足球、游泳、网球、篮球),C 有两种(例如网球)和游泳)
我该怎么做?
假设空值实际上是 NA
值而不是字符串 "NA"
,您可以使用包 dplyr
和 tidyr
来实现您的预期输出
library(dplyr)
library(tidyr)
df %>%
pivot_longer(-ID) %>%
filter(!is.na(value)) %>% # if you have strings "NA" use filter(value != "NA")
group_by(ID) %>%
summarise(n = n_distinct(value))
# A tibble: 3 x 2
# ID n
# <chr> <int>
# 1 A 3
# 2 B 4
# 3 C 2