是否有 R 函数(或步骤序列)来分组和汇总(计数)这样的数据框(行中有一些重复值)

is there a R function (or sequence of steps) to grouping and summarise (count) a dataframe like this (with some repeated values in the rows)

我有一个这样的df

df = data.frame (user = c('u1', 'u1', 'u1', 'u2', 'u2'),
                 entity = c('e1','e2','e3','e3','e4'),
                 area = c('a1','a1','a2','a2','a1'),
                 sex=c('M','M','M','F','F'))

我需要获得这样的 df

df2<- data.frame (area = c('a1', 'a2'),
                  male = c(1,1),
                  female = c(1,1),
                  total=c(2,2))

按地区统计男女人数

更新:

我还是不太确定。我将 Yuriy Saraykin 的想法与 distinct 一起使用(感谢他 +1):

library(dplyr)
library(tidyr)

df %>% 
  distinct(user, area, sex) %>%
  group_by(area, sex) %>% 
  summarise(value =n()) %>% 
  pivot_wider(
    names_from = sex,
    values_from = value
  ) %>% 
  mutate(total = sum(F, M)) %>% 
  rename(female=F, male=M)
area  female  male total
  <chr>  <int> <int> <int>
1 a1         1     1     2
2 a2         1     1     2

第一个答案:不正确! 一种方式可能是:

library(dplyr)
library(tidyr)

df %>% 
  group_by(area, sex) %>% 
  summarise(value =n()) %>% 
  pivot_wider(
    names_from = sex,
    values_from = value
  ) %>% 
  mutate(total = F+M) %>% 
  rename(female=F, male=M)
 area  female  male total
  <chr>  <int> <int> <int>
1 a1         1     2     3
2 a2         1     1     2
df = data.frame (user = c('u1', 'u1', 'u1', 'u2', 'u2'),
                 entity = c('e1','e2','e3','e3','e4'),
                 area = c('a1','a1','a2','a2','a1'),
                 sex=c('M','M','M','F','F'))

library(tidyverse)
df %>%
  distinct(user, area, sex) %>%
  mutate(sex = ifelse(sex == "M", "male", "female")) %>% 
  pivot_wider(
    id_cols = area,
    names_from = sex,
    values_from = sex,
    values_fill = 0,
    values_fn = length
  ) %>% 
  mutate(Total = rowSums(across(male:female)))
#> # A tibble: 2 x 4
#>   area   male female Total
#>   <chr> <int>  <int> <dbl>
#> 1 a1        1      1     2
#> 2 a2        1      1     2

reprex package (v2.0.1)

于 2022-01-25 创建