如何根据另一列的值聚合两列的R数据框
How to aggregate R dataframe of two columns based on values of another
我的数据框如下,其中性别==“1”指男性,性别==“2”指女性,职业从 A 到 U,年份从 2010 年到 2018 年(我给你一个小例子)
Gender Occupation Year
1 A 2010
1 A 2010
2 A 2010
1 B 2010
2 B 2010
1 A 2011
2 A 2011
1 C 2011
2 C 2011
我想要一个输出,对性别、年份和职业不同的行数求和,如下所示:
Year | Occupation | Men | Woman
2010 | A | 2 | 1
2010 | B | 1 | 1
2011 | A | 1 | 1
2011 | C | 1 | 1
我试过以下方法:
Nr_gender_occupation <- data %>%
group_by(year, occupation) %>%
summarise(
Men = aggregate(gender=="1" ~ occupation, FUN= count),
Women = aggregate(gender=="2" ~ occupation, FUN=count)
)
我们可以使用 'Gender' 中的索引来更改值,然后使用 pivot_wider
从 tidyr
将数据重塑为 'wide' 格式
library(dplyr)
library(tidyr)
data %>%
mutate(Gender = c("Male", "Female")[Gender]) %>%
pivot_wider(names_from = Gender, values_from = Gender, values_fn = length)
-输出
# A tibble: 4 x 4
# Occupation Year Male Female
# <chr> <int> <int> <int>
#1 A 2010 2 1
#2 B 2010 1 1
#3 A 2011 1 1
#4 C 2011 1 1
或使用 table
和 unnest
library(tidyr)
data %>%
group_by(Year, Occupation) %>%
summarise(out = list(table(Gender)), .groups = 'drop') %>%
unnest_wider(out)
或者我们可以使用 count
和 pivot_wider
data %>%
count(Gender, Occupation, Year) %>%
pivot_wider(names_from = Gender, values_from = n)
数据
data <- structure(list(Gender = c(1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L),
Occupation = c("A", "A", "A", "B", "B", "A", "A", "C", "C"
), Year = c(2010L, 2010L, 2010L, 2010L, 2010L, 2011L, 2011L,
2011L, 2011L)), class = "data.frame", row.names = c(NA, -9L
))
您还可以在您的组内进行计数:
library(dplyr)
df %>%
group_by(Occupation, Year) %>%
summarize(Men = sum(Gender == 1),
Woman = sum(Gender == 2), .groups = "drop")
输出
Occupation Year Men Woman
<chr> <dbl> <int> <int>
1 A 2010 2 1
2 A 2011 1 1
3 B 2010 1 1
4 C 2011 1 1
data.table
选项使用 dcast
dcast(setDT(df), Year + Occupation ~ c("Men", "Woman")[Gender])
给予
Year Occupation Men Woman
1: 2010 A 2 1
2: 2010 B 1 1
3: 2011 A 1 1
4: 2011 C 1 1
我的数据框如下,其中性别==“1”指男性,性别==“2”指女性,职业从 A 到 U,年份从 2010 年到 2018 年(我给你一个小例子)
Gender Occupation Year
1 A 2010
1 A 2010
2 A 2010
1 B 2010
2 B 2010
1 A 2011
2 A 2011
1 C 2011
2 C 2011
我想要一个输出,对性别、年份和职业不同的行数求和,如下所示:
Year | Occupation | Men | Woman
2010 | A | 2 | 1
2010 | B | 1 | 1
2011 | A | 1 | 1
2011 | C | 1 | 1
我试过以下方法:
Nr_gender_occupation <- data %>%
group_by(year, occupation) %>%
summarise(
Men = aggregate(gender=="1" ~ occupation, FUN= count),
Women = aggregate(gender=="2" ~ occupation, FUN=count)
)
我们可以使用 'Gender' 中的索引来更改值,然后使用 pivot_wider
从 tidyr
将数据重塑为 'wide' 格式
library(dplyr)
library(tidyr)
data %>%
mutate(Gender = c("Male", "Female")[Gender]) %>%
pivot_wider(names_from = Gender, values_from = Gender, values_fn = length)
-输出
# A tibble: 4 x 4
# Occupation Year Male Female
# <chr> <int> <int> <int>
#1 A 2010 2 1
#2 B 2010 1 1
#3 A 2011 1 1
#4 C 2011 1 1
或使用 table
和 unnest
library(tidyr)
data %>%
group_by(Year, Occupation) %>%
summarise(out = list(table(Gender)), .groups = 'drop') %>%
unnest_wider(out)
或者我们可以使用 count
和 pivot_wider
data %>%
count(Gender, Occupation, Year) %>%
pivot_wider(names_from = Gender, values_from = n)
数据
data <- structure(list(Gender = c(1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L),
Occupation = c("A", "A", "A", "B", "B", "A", "A", "C", "C"
), Year = c(2010L, 2010L, 2010L, 2010L, 2010L, 2011L, 2011L,
2011L, 2011L)), class = "data.frame", row.names = c(NA, -9L
))
您还可以在您的组内进行计数:
library(dplyr)
df %>%
group_by(Occupation, Year) %>%
summarize(Men = sum(Gender == 1),
Woman = sum(Gender == 2), .groups = "drop")
输出
Occupation Year Men Woman
<chr> <dbl> <int> <int>
1 A 2010 2 1
2 A 2011 1 1
3 B 2010 1 1
4 C 2011 1 1
data.table
选项使用 dcast
dcast(setDT(df), Year + Occupation ~ c("Men", "Woman")[Gender])
给予
Year Occupation Men Woman
1: 2010 A 2 1
2: 2010 B 1 1
3: 2011 A 1 1
4: 2011 C 1 1