获取数据框列表中的因子长度 [R]

Getting length of factors inside a list of data frames [R]

我正在尝试应用 —lapply— 一个函数来创建一个列,其中包含列表中多个数据框中的因子长度。

这是我的示例数据:

> head(m.list)
$df.1
         Date Years
56 1967-01-17  55  
10 1981-07-27  40  
34 1973-09-30  48  
98 1944-03-17  78  
27 1986-07-17  35  

$df.2
         Date Years
56 1967-01-17  55  
10 1981-07-27  40  
34 1973-09-30  48  
98 1944-03-17  78  
27 1986-07-17  35  

我已经设法使用休息时间创建群组:

year_cut <- function(m.list, col)
      {cut(m.list[,col],
       breaks=c(10,20, 30, 40, 50, 60, 100),
       right = FALSE,
       labels = c("A","B","C","D","E","F"))}

m.list = lapply(m.list, function(x)
                cbind(x, "Group" = year_cut(m.list = x,
                      col ="Years")))
>head(m.list)    
$df.1
         Date Years Group
56 1967-01-17  55   E
10 1981-07-27  40   D
34 1973-09-30  48   D
98 1944-03-17  78   F
27 1986-07-17  35   B

$df.2
         Date Years Group
56 1967-01-17  55   E
10 1981-07-27  40   D
34 1973-09-30  48   D
98 1944-03-17  78   F
27 1986-07-17  35   B

现在我正在尝试获取组的长度,但我没有这样做。

我尝试了两种不同的方法均未成功:

cut_summary <- function(m.list, col)
{ summarize(
  group_by(m.list,!!as.name(col)),
  length(col)) }
    
m.list = lapply(m.list, function(x)
cbind(x, "cut_total" = cut_summary(m.list = x,
 col ="Group"))) 
    
Error in data.frame(..., check.names = FALSE) : 
arguments imply differing number of rows: 436, 7
    
cut_summary <- function(m.list, col)
{ group_by(m.list,!!as.name(col)) %>% length(col)}
    
m.list = lapply(m.list, function(x)
cbind(x, "cut_total" = cut_summary(m.list = x,
      col ="Group")))

Error in length(., col) :
2 arguments passed to 'length' which requires 1

理想情况下,我应该得到:

>head(m.list)    
$df.1
         Date Years Group Total
56 1967-01-17  55   E      22
10 1981-07-27  40   D      32
34 1973-09-30  48   D      32
98 1944-03-17  78   F      4
27 1986-07-17  35   B      20

$df.2
         Date Years Group Total
56 2005-01-17  17   A      22
10 1981-07-27  40   C      19
34 1973-09-30  48   E      3
98 1944-03-17  78   F      50
27 1986-07-17  35   B      4

欢迎任何帮助。谢谢!

我们可以使用 mutate/add_count 创建两列 - 使用 purrr::map(或来自基数 R 的 lapply)遍历 list,然后 mutate 到通过在 'Years' 列上应用 'year_cut' 创建 'Group' 列,并使用 add_count 创建计数列

library(dplyr)
library(purrr)
map(m.list,  ~ .x %>%
               mutate(Group = year_cut(., col ="Years")) %>%
               add_count(Group, name = "Total"))

-输出

$df.1
        Date Years Group Total
1 1967-01-17    55     E     1
2 1981-07-27    40     D     2
3 1973-09-30    48     D     2
4 1944-03-17    78     F     1
5 1986-07-17    35     C     1

$df.2
        Date Years Group Total
1 1967-01-17    55     E     1
2 1981-07-27    40     D     2
3 1973-09-30    48     D     2
4 1944-03-17    78     F     1
5 1986-07-17    35     C     1

OP 的函数对字符串输入应用 length。相反,它应该是 length(!!as.name(col)) 或更简单地说是 n()。此外,summarise returns 仅分组列和汇总输出列。根据预期的输出,OP 似乎想要完整的数据集或在原始数据集中添加一个新列。在那种情况下使用 mutate

cut_summary <- function(m.list, col)
  { mutate(group_by(m.list,!!as.name(col)), Total = n())}

然后用

调用已经修改的m.list
m.list <- lapply(m.list, function(x)
                 cbind(x, "Group" = year_cut(m.list = x, col ="Years")))
lapply(m.list, function(x) cut_summary(x, col = "Group"))
$df.1
# A tibble: 5 × 4
# Groups:   Group [4]
  Date       Years Group Total
  <chr>      <int> <fct> <int>
1 1967-01-17    55 E         1
2 1981-07-27    40 D         2
3 1973-09-30    48 D         2
4 1944-03-17    78 F         1
5 1986-07-17    35 C         1

$df.2
# A tibble: 5 × 4
# Groups:   Group [4]
  Date       Years Group Total
  <chr>      <int> <fct> <int>
1 1967-01-17    55 E         1
2 1981-07-27    40 D         2
3 1973-09-30    48 D         2
4 1944-03-17    78 F         1
5 1986-07-17    35 C         1

数据

m.list <- list(df.1 = structure(list(Date = c("1967-01-17", "1981-07-27", 
"1973-09-30", "1944-03-17", "1986-07-17"), Years = c(55L, 40L, 
48L, 78L, 35L)), class = "data.frame", row.names = c("56", "10", 
"34", "98", "27")), df.2 = structure(list(Date = c("1967-01-17", 
"1981-07-27", "1973-09-30", "1944-03-17", "1986-07-17"), Years = c(55L, 
40L, 48L, 78L, 35L)), class = "data.frame", row.names = c("56", 
"10", "34", "98", "27")))