在多列上按组汇总唯一值
Summarizing unique values by group over multiple columns
我有以下问题:
我的数据集包含对许多不同武器系统(级别)的国家年观察。我想知道每个组(国家)在数据集的时间跨度内有多少不同的系统(唯一值)。
经过简化,数据集如下所示:
a <- c("Greece", "Greece", "Belgium", "Belgium", "Germany", "Germany")
b <- c(1980, 1981, 1980, 1981, 1980, 1981)
c1 <- c("Weapon1", "Weapon1", "Weapon5", "Weapon5", "Weapon3", "Weapon2")
d <- c("Weapon2", "Weapon4", "Weapon2", "Weapon2", "Weapon1", "Weapon3")
e <- c("Weapon3", "Weapon3", "Weapon3", "Weapon4", "Weapon2", NA)
df <- data.frame(a,b,c1,d,e)
a b c1 d e
1 Greece 1980 Weapon1 Weapon2 Weapon3
2 Greece 1981 Weapon1 Weapon4 Weapon3
3 Belgium 1980 Weapon5 Weapon2 Weapon3
4 Belgium 1981 Weapon5 Weapon2 Weapon4
5 Germany 1980 Weapon3 Weapon1 Weapon2
6 Germany 1981 Weapon2 Weapon3 <NA>
因此在示例代码中,德国总共部署了 3 种不同的武器系统。我该怎么做?
提前谢谢大家!
library(tidyverse)
df %>%
pivot_longer(cols = c(c1, d, e)) %>%
group_by(a) %>%
filter(!is.na(value)) %>%
distinct(value) %>%
summarize(n=n())
给出:
# # A tibble: 3 x 2
# a n
# <chr> <int>
# 1 Belgium 4
# 2 Germany 3
# 3 Greece 4
在base R
中,我们可以做到
stack(rowSums(table(rep(df$a, 3), unlist(df[3:5])) > 0))[2:1]
ind values
1 Belgium 4
2 Germany 3
3 Greece 4
我有以下问题:
我的数据集包含对许多不同武器系统(级别)的国家年观察。我想知道每个组(国家)在数据集的时间跨度内有多少不同的系统(唯一值)。
经过简化,数据集如下所示:
a <- c("Greece", "Greece", "Belgium", "Belgium", "Germany", "Germany")
b <- c(1980, 1981, 1980, 1981, 1980, 1981)
c1 <- c("Weapon1", "Weapon1", "Weapon5", "Weapon5", "Weapon3", "Weapon2")
d <- c("Weapon2", "Weapon4", "Weapon2", "Weapon2", "Weapon1", "Weapon3")
e <- c("Weapon3", "Weapon3", "Weapon3", "Weapon4", "Weapon2", NA)
df <- data.frame(a,b,c1,d,e)
a b c1 d e
1 Greece 1980 Weapon1 Weapon2 Weapon3
2 Greece 1981 Weapon1 Weapon4 Weapon3
3 Belgium 1980 Weapon5 Weapon2 Weapon3
4 Belgium 1981 Weapon5 Weapon2 Weapon4
5 Germany 1980 Weapon3 Weapon1 Weapon2
6 Germany 1981 Weapon2 Weapon3 <NA>
因此在示例代码中,德国总共部署了 3 种不同的武器系统。我该怎么做?
提前谢谢大家!
library(tidyverse)
df %>%
pivot_longer(cols = c(c1, d, e)) %>%
group_by(a) %>%
filter(!is.na(value)) %>%
distinct(value) %>%
summarize(n=n())
给出:
# # A tibble: 3 x 2
# a n
# <chr> <int>
# 1 Belgium 4
# 2 Germany 3
# 3 Greece 4
在base R
中,我们可以做到
stack(rowSums(table(rep(df$a, 3), unlist(df[3:5])) > 0))[2:1]
ind values
1 Belgium 4
2 Germany 3
3 Greece 4