获取数据框列表中的因子长度 [R]
Getting length of factors inside a list of data frames [R]
我正在尝试应用 —lapply— 一个函数来创建一个列,其中包含列表中多个数据框中的因子长度。
这是我的示例数据:
> head(m.list)
$df.1
Date Years
56 1967-01-17 55
10 1981-07-27 40
34 1973-09-30 48
98 1944-03-17 78
27 1986-07-17 35
$df.2
Date Years
56 1967-01-17 55
10 1981-07-27 40
34 1973-09-30 48
98 1944-03-17 78
27 1986-07-17 35
我已经设法使用休息时间创建群组:
year_cut <- function(m.list, col)
{cut(m.list[,col],
breaks=c(10,20, 30, 40, 50, 60, 100),
right = FALSE,
labels = c("A","B","C","D","E","F"))}
m.list = lapply(m.list, function(x)
cbind(x, "Group" = year_cut(m.list = x,
col ="Years")))
>head(m.list)
$df.1
Date Years Group
56 1967-01-17 55 E
10 1981-07-27 40 D
34 1973-09-30 48 D
98 1944-03-17 78 F
27 1986-07-17 35 B
$df.2
Date Years Group
56 1967-01-17 55 E
10 1981-07-27 40 D
34 1973-09-30 48 D
98 1944-03-17 78 F
27 1986-07-17 35 B
现在我正在尝试获取组的长度,但我没有这样做。
我尝试了两种不同的方法均未成功:
cut_summary <- function(m.list, col)
{ summarize(
group_by(m.list,!!as.name(col)),
length(col)) }
m.list = lapply(m.list, function(x)
cbind(x, "cut_total" = cut_summary(m.list = x,
col ="Group")))
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 436, 7
cut_summary <- function(m.list, col)
{ group_by(m.list,!!as.name(col)) %>% length(col)}
m.list = lapply(m.list, function(x)
cbind(x, "cut_total" = cut_summary(m.list = x,
col ="Group")))
Error in length(., col) :
2 arguments passed to 'length' which requires 1
理想情况下,我应该得到:
>head(m.list)
$df.1
Date Years Group Total
56 1967-01-17 55 E 22
10 1981-07-27 40 D 32
34 1973-09-30 48 D 32
98 1944-03-17 78 F 4
27 1986-07-17 35 B 20
$df.2
Date Years Group Total
56 2005-01-17 17 A 22
10 1981-07-27 40 C 19
34 1973-09-30 48 E 3
98 1944-03-17 78 F 50
27 1986-07-17 35 B 4
欢迎任何帮助。谢谢!
我们可以使用 mutate/add_count
创建两列 - 使用 purrr::map
(或来自基数 R 的 lapply
)遍历 list
,然后 mutate
到通过在 'Years' 列上应用 'year_cut' 创建 'Group' 列,并使用 add_count
创建计数列
library(dplyr)
library(purrr)
map(m.list, ~ .x %>%
mutate(Group = year_cut(., col ="Years")) %>%
add_count(Group, name = "Total"))
-输出
$df.1
Date Years Group Total
1 1967-01-17 55 E 1
2 1981-07-27 40 D 2
3 1973-09-30 48 D 2
4 1944-03-17 78 F 1
5 1986-07-17 35 C 1
$df.2
Date Years Group Total
1 1967-01-17 55 E 1
2 1981-07-27 40 D 2
3 1973-09-30 48 D 2
4 1944-03-17 78 F 1
5 1986-07-17 35 C 1
OP 的函数对字符串输入应用 length
。相反,它应该是 length(!!as.name(col))
或更简单地说是 n()
。此外,summarise
returns 仅分组列和汇总输出列。根据预期的输出,OP 似乎想要完整的数据集或在原始数据集中添加一个新列。在那种情况下使用 mutate
cut_summary <- function(m.list, col)
{ mutate(group_by(m.list,!!as.name(col)), Total = n())}
然后用
调用已经修改的m.list
m.list <- lapply(m.list, function(x)
cbind(x, "Group" = year_cut(m.list = x, col ="Years")))
lapply(m.list, function(x) cut_summary(x, col = "Group"))
$df.1
# A tibble: 5 × 4
# Groups: Group [4]
Date Years Group Total
<chr> <int> <fct> <int>
1 1967-01-17 55 E 1
2 1981-07-27 40 D 2
3 1973-09-30 48 D 2
4 1944-03-17 78 F 1
5 1986-07-17 35 C 1
$df.2
# A tibble: 5 × 4
# Groups: Group [4]
Date Years Group Total
<chr> <int> <fct> <int>
1 1967-01-17 55 E 1
2 1981-07-27 40 D 2
3 1973-09-30 48 D 2
4 1944-03-17 78 F 1
5 1986-07-17 35 C 1
数据
m.list <- list(df.1 = structure(list(Date = c("1967-01-17", "1981-07-27",
"1973-09-30", "1944-03-17", "1986-07-17"), Years = c(55L, 40L,
48L, 78L, 35L)), class = "data.frame", row.names = c("56", "10",
"34", "98", "27")), df.2 = structure(list(Date = c("1967-01-17",
"1981-07-27", "1973-09-30", "1944-03-17", "1986-07-17"), Years = c(55L,
40L, 48L, 78L, 35L)), class = "data.frame", row.names = c("56",
"10", "34", "98", "27")))
我正在尝试应用 —lapply— 一个函数来创建一个列,其中包含列表中多个数据框中的因子长度。
这是我的示例数据:
> head(m.list)
$df.1
Date Years
56 1967-01-17 55
10 1981-07-27 40
34 1973-09-30 48
98 1944-03-17 78
27 1986-07-17 35
$df.2
Date Years
56 1967-01-17 55
10 1981-07-27 40
34 1973-09-30 48
98 1944-03-17 78
27 1986-07-17 35
我已经设法使用休息时间创建群组:
year_cut <- function(m.list, col)
{cut(m.list[,col],
breaks=c(10,20, 30, 40, 50, 60, 100),
right = FALSE,
labels = c("A","B","C","D","E","F"))}
m.list = lapply(m.list, function(x)
cbind(x, "Group" = year_cut(m.list = x,
col ="Years")))
>head(m.list)
$df.1
Date Years Group
56 1967-01-17 55 E
10 1981-07-27 40 D
34 1973-09-30 48 D
98 1944-03-17 78 F
27 1986-07-17 35 B
$df.2
Date Years Group
56 1967-01-17 55 E
10 1981-07-27 40 D
34 1973-09-30 48 D
98 1944-03-17 78 F
27 1986-07-17 35 B
现在我正在尝试获取组的长度,但我没有这样做。
我尝试了两种不同的方法均未成功:
cut_summary <- function(m.list, col)
{ summarize(
group_by(m.list,!!as.name(col)),
length(col)) }
m.list = lapply(m.list, function(x)
cbind(x, "cut_total" = cut_summary(m.list = x,
col ="Group")))
Error in data.frame(..., check.names = FALSE) :
arguments imply differing number of rows: 436, 7
cut_summary <- function(m.list, col)
{ group_by(m.list,!!as.name(col)) %>% length(col)}
m.list = lapply(m.list, function(x)
cbind(x, "cut_total" = cut_summary(m.list = x,
col ="Group")))
Error in length(., col) :
2 arguments passed to 'length' which requires 1
理想情况下,我应该得到:
>head(m.list)
$df.1
Date Years Group Total
56 1967-01-17 55 E 22
10 1981-07-27 40 D 32
34 1973-09-30 48 D 32
98 1944-03-17 78 F 4
27 1986-07-17 35 B 20
$df.2
Date Years Group Total
56 2005-01-17 17 A 22
10 1981-07-27 40 C 19
34 1973-09-30 48 E 3
98 1944-03-17 78 F 50
27 1986-07-17 35 B 4
欢迎任何帮助。谢谢!
我们可以使用 mutate/add_count
创建两列 - 使用 purrr::map
(或来自基数 R 的 lapply
)遍历 list
,然后 mutate
到通过在 'Years' 列上应用 'year_cut' 创建 'Group' 列,并使用 add_count
创建计数列
library(dplyr)
library(purrr)
map(m.list, ~ .x %>%
mutate(Group = year_cut(., col ="Years")) %>%
add_count(Group, name = "Total"))
-输出
$df.1
Date Years Group Total
1 1967-01-17 55 E 1
2 1981-07-27 40 D 2
3 1973-09-30 48 D 2
4 1944-03-17 78 F 1
5 1986-07-17 35 C 1
$df.2
Date Years Group Total
1 1967-01-17 55 E 1
2 1981-07-27 40 D 2
3 1973-09-30 48 D 2
4 1944-03-17 78 F 1
5 1986-07-17 35 C 1
OP 的函数对字符串输入应用 length
。相反,它应该是 length(!!as.name(col))
或更简单地说是 n()
。此外,summarise
returns 仅分组列和汇总输出列。根据预期的输出,OP 似乎想要完整的数据集或在原始数据集中添加一个新列。在那种情况下使用 mutate
cut_summary <- function(m.list, col)
{ mutate(group_by(m.list,!!as.name(col)), Total = n())}
然后用
调用已经修改的m.list
m.list <- lapply(m.list, function(x)
cbind(x, "Group" = year_cut(m.list = x, col ="Years")))
lapply(m.list, function(x) cut_summary(x, col = "Group"))
$df.1
# A tibble: 5 × 4
# Groups: Group [4]
Date Years Group Total
<chr> <int> <fct> <int>
1 1967-01-17 55 E 1
2 1981-07-27 40 D 2
3 1973-09-30 48 D 2
4 1944-03-17 78 F 1
5 1986-07-17 35 C 1
$df.2
# A tibble: 5 × 4
# Groups: Group [4]
Date Years Group Total
<chr> <int> <fct> <int>
1 1967-01-17 55 E 1
2 1981-07-27 40 D 2
3 1973-09-30 48 D 2
4 1944-03-17 78 F 1
5 1986-07-17 35 C 1
数据
m.list <- list(df.1 = structure(list(Date = c("1967-01-17", "1981-07-27",
"1973-09-30", "1944-03-17", "1986-07-17"), Years = c(55L, 40L,
48L, 78L, 35L)), class = "data.frame", row.names = c("56", "10",
"34", "98", "27")), df.2 = structure(list(Date = c("1967-01-17",
"1981-07-27", "1973-09-30", "1944-03-17", "1986-07-17"), Years = c(55L,
40L, 48L, 78L, 35L)), class = "data.frame", row.names = c("56",
"10", "34", "98", "27")))