是否有 R 函数(或步骤序列)来分组和汇总(计数)这样的数据框(行中有一些重复值)
is there a R function (or sequence of steps) to grouping and summarise (count) a dataframe like this (with some repeated values in the rows)
我有一个这样的df
df = data.frame (user = c('u1', 'u1', 'u1', 'u2', 'u2'),
entity = c('e1','e2','e3','e3','e4'),
area = c('a1','a1','a2','a2','a1'),
sex=c('M','M','M','F','F'))
我需要获得这样的 df
df2<- data.frame (area = c('a1', 'a2'),
male = c(1,1),
female = c(1,1),
total=c(2,2))
按地区统计男女人数
更新:
我还是不太确定。我将 Yuriy Saraykin 的想法与 distinct
一起使用(感谢他 +1):
library(dplyr)
library(tidyr)
df %>%
distinct(user, area, sex) %>%
group_by(area, sex) %>%
summarise(value =n()) %>%
pivot_wider(
names_from = sex,
values_from = value
) %>%
mutate(total = sum(F, M)) %>%
rename(female=F, male=M)
area female male total
<chr> <int> <int> <int>
1 a1 1 1 2
2 a2 1 1 2
第一个答案:不正确!
一种方式可能是:
library(dplyr)
library(tidyr)
df %>%
group_by(area, sex) %>%
summarise(value =n()) %>%
pivot_wider(
names_from = sex,
values_from = value
) %>%
mutate(total = F+M) %>%
rename(female=F, male=M)
area female male total
<chr> <int> <int> <int>
1 a1 1 2 3
2 a2 1 1 2
df = data.frame (user = c('u1', 'u1', 'u1', 'u2', 'u2'),
entity = c('e1','e2','e3','e3','e4'),
area = c('a1','a1','a2','a2','a1'),
sex=c('M','M','M','F','F'))
library(tidyverse)
df %>%
distinct(user, area, sex) %>%
mutate(sex = ifelse(sex == "M", "male", "female")) %>%
pivot_wider(
id_cols = area,
names_from = sex,
values_from = sex,
values_fill = 0,
values_fn = length
) %>%
mutate(Total = rowSums(across(male:female)))
#> # A tibble: 2 x 4
#> area male female Total
#> <chr> <int> <int> <dbl>
#> 1 a1 1 1 2
#> 2 a2 1 1 2
由 reprex package (v2.0.1)
于 2022-01-25 创建
我有一个这样的df
df = data.frame (user = c('u1', 'u1', 'u1', 'u2', 'u2'),
entity = c('e1','e2','e3','e3','e4'),
area = c('a1','a1','a2','a2','a1'),
sex=c('M','M','M','F','F'))
我需要获得这样的 df
df2<- data.frame (area = c('a1', 'a2'),
male = c(1,1),
female = c(1,1),
total=c(2,2))
按地区统计男女人数
更新:
我还是不太确定。我将 Yuriy Saraykin 的想法与 distinct
一起使用(感谢他 +1):
library(dplyr)
library(tidyr)
df %>%
distinct(user, area, sex) %>%
group_by(area, sex) %>%
summarise(value =n()) %>%
pivot_wider(
names_from = sex,
values_from = value
) %>%
mutate(total = sum(F, M)) %>%
rename(female=F, male=M)
area female male total
<chr> <int> <int> <int>
1 a1 1 1 2
2 a2 1 1 2
第一个答案:不正确! 一种方式可能是:
library(dplyr)
library(tidyr)
df %>%
group_by(area, sex) %>%
summarise(value =n()) %>%
pivot_wider(
names_from = sex,
values_from = value
) %>%
mutate(total = F+M) %>%
rename(female=F, male=M)
area female male total
<chr> <int> <int> <int>
1 a1 1 2 3
2 a2 1 1 2
df = data.frame (user = c('u1', 'u1', 'u1', 'u2', 'u2'),
entity = c('e1','e2','e3','e3','e4'),
area = c('a1','a1','a2','a2','a1'),
sex=c('M','M','M','F','F'))
library(tidyverse)
df %>%
distinct(user, area, sex) %>%
mutate(sex = ifelse(sex == "M", "male", "female")) %>%
pivot_wider(
id_cols = area,
names_from = sex,
values_from = sex,
values_fill = 0,
values_fn = length
) %>%
mutate(Total = rowSums(across(male:female)))
#> # A tibble: 2 x 4
#> area male female Total
#> <chr> <int> <int> <dbl>
#> 1 a1 1 1 2
#> 2 a2 1 1 2
由 reprex package (v2.0.1)
于 2022-01-25 创建