使用 collapse R 包按组计算观察值

Counting observations by group using collapse R package

我想将以下 R 代码从 tidyverse 翻译成 collapse。以下代码按组对观察结果进行计数,并将其作为一列附加到 data.frame.

library(tidyverse)
library(collapse)
head(wlddev)

wlddev %>% 
  group_by(income) %>% 
  add_count(., name = "Size") %>% 
  select(country, income, Size) %>% 
  distinct()
# A tibble: 216 x 3
# Groups:   income [4]
   country             income               Size
   <chr>               <fct>               <int>
 1 Afghanistan         Low income           1830
 2 Albania             Upper middle income  3660
 3 Algeria             Upper middle income  3660
 4 American Samoa      Upper middle income  3660
 5 Andorra             High income          4819
 6 Angola              Lower middle income  2867
 7 Antigua and Barbuda High income          4819
 8 Argentina           Upper middle income  3660
 9 Armenia             Upper middle income  3660
10 Aruba               High income          4819
# ... with 206 more rows

现在想用 collapse R 包完成同样的任务。

以下代码按预期工作。

wlddev %>%
  fgroup_by(income) %>%
  fselect(country) %>% 
  fnobs()

               income country
1         High income    4819
2          Low income    1830
3 Lower middle income    2867
4 Upper middle income    3660

但是,无法将列附加到原始 data.frame。

wlddev %>%
  fgroup_by(income) %>%
  fselect(country) %>% 
  fnobs() %>% 
  ftransform(.data = wlddev, Size = .)

Error in ftransform_core(.data, e) : 
  Lengths of replacements must be equal to nrow(.data) or 1, or NULL to delete columns

任何提示,请。

不同于add_count在原始数据中创建列,fnobs是汇总数据,我们可以加入

library(collapse)
 wlddev %>% 
    fgroup_by(income) %>%
    fselect(country) %>%   
    fnobs() %>% 
    rename(size = country) %>% 
   left_join(wlddev %>% 
      slt(country, income), .) %>% 
   distinct

找到一个非常简单的解决方案:

wlddev %>% 
  fmutate(Size = fnobs(income, income, TRA = "replace_fill"))  %>% 
  fselect(country, income, Size) %>% 
  funique()

所以原则上 fnobs 计算 non-missing 值的数量,并没有真正提供添加组计数的选项(我也想知道为什么这是必要的,我从来没有要求它).然而,计数在分组对象中,可以使用 GRP(.) 检索。所以你可以创建一个函数:

gcount <- function(x) {
   # Just turning some unnecessary things off in case we pass a plain vector
   g <- GRP(x, sort = FALSE, return.groups = FALSE, call = FALSE) 
   g$group.sizes[g$group.id]
}

那我们可以做

wlddev %>% 
  ftransform(Size = gcount(income)) %>%
  fselect(country, income, Size) %>% 
  funique(cols = 1) # Observations are uniquely identified by country

# or 

wlddev %>% 
  fgroup_by(income) %>%
  ftransform(Size = gcount(.)) %>%
  fselect(country, income, Size) %>% 
  fungroup() %>%
  funique(cols = 1) 

当然我们也可以使用fnobs:

wlddev %>% 
  fgroup_by(income) %>%
  fmutate(Size = fnobs(income)) %>%
  fselect(country, income, Size) %>% 
  fungroup() %>%
  funique(cols = 1) 

但如果 income 包含缺失值,这可能会产生误导。请注意(如文档中所述)ftransform 是忽略分组的 base::transform 的更快版本,而 fmutate 是尊重分组的更快 dplyr::mutate 版本。

如果你告诉我为什么需要组计数作为数据框中的变量,我可以考虑将 gcount 添加到下一个崩溃版本中。