DPLYR 中 summarize across 的奇怪行为
Odd behavior of summarise across in DPLYR
根据对 children 及其 parent 进行的一项调查,我有两个大桌子 (~12k x 6)。这些表的维度相同,types/classes,并以相同的方式处理到 R 中。经过一番争吵(同样,对 children 和 parents 做了同样的事情)我 运行 以下代码:
UPDATE:原来我的问题的根源是变量 C,它在 Children
数据集中只有值 0 和 1。将 summarise
与 table
一起使用时,有什么方法可以解决此错误?
Parents %>%
summarise(across(A, ~ table(.x)),
across(B, ~table(.x)),
across(C, ~ table(.x)),
across(D, ~ table(.x)),
across(E, ~ table(.x)))
Children %>%
summarise(across(A, ~ table(.x)),
across(B, ~table(.x)),
across(C, ~ table(.x)),
across(D, ~ table(.x)),
across(E, ~ table(.x)))
对于 Parents
我得到以下输出(唯一值的频率 D var (1,2,3), others (0,1,2):
A B C D E
1 11840 11835 11409 11363 519
2 35 42 436 473 4912
3 3 1 33 42 6447
对于 Children
我得到以下错误:
Error: Problem with `summarise()` input `..5`.
x Input `..5` must be size 4 or 1, not 3.
ℹ An earlier column had size 4.
ℹ Input `..5` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.
Run `rlang::last_error()` to see where the error occurred.
运行 rlang::last_error()
returns:
<error/dplyr_error>
Problem with `summarise()` input `..5`.
x Input `..5` must be size 4 or 1, not 3.
ℹ An earlier column had size 4.
ℹ Input `..5` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.
Backtrace:
Run `rlang::last_trace()` to see the full context.
运行 rlang::last_trace()
returns:
<error/dplyr_error>
Problem with `summarise()` input `..5`.
x Input `..5` must be size 4 or 1, not 3.
ℹ An earlier column had size 4.
ℹ Input `..5` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.
Backtrace:
█
1. ├─`%>%`(...)
2. ├─dplyr::summarise(...)
3. ├─dplyr:::summarise.data.frame(...)
4. │ └─dplyr:::summarise_cols(.data, ...)
5. │ └─base::withCallingHandlers(...)
6. ├─dplyr:::abort_glue(...)
7. │ ├─rlang::exec(abort, class = class, !!!data)
8. │ └─(function (message = NULL, class = NULL, ..., trace = NULL, parent = NULL, ...
9. │ └─rlang:::signal_abort(cnd)
10. │ └─base::signalCondition(cnd)
11. └─(function (e) ...
有人知道会发生什么吗?
为了理智起见,这里是 str
摘要:
> str(Parents)
'data.frame': 11878 obs. of 6 variables:
$ ID : chr "Parent 1" "Parent 2" "Parent 3" "Parent 4" ...
$ A : num 0 0 0 0 0 0 0 0 0 0 ...
$ B : num 0 0 0 0 0 0 0 0 0 0 ...
$ C : num 0 0 0 0 0 0 0 0 0 0 ...
$ D : num 2 2 1 2 3 3 2 3 3 2 ...
$ E : num 0 0 0 0 0 0 0 0 0 0 ...
> str(Children)
'data.frame': 11878 obs. of 6 variables:
$ ID : chr "Child 1" "Child 2" "Child 3" "Child 4" ...
$ A : num 0 0 0 0 0 0 0 0 0 0 ...
$ B : num 0 0 0 0 0 0 0 0 0 0 ...
$ C : num 0 0 0 0 0 0 0 0 0 0 ...
$ D : num 2 2 1 2 3 3 2 3 3 2 ...
$ E : num 0 0 0 0 0 0 0 0 0 0 ...
table
不一定适合 tidyverse
管道,因为它 returns 值的数量不相等。我觉得获取长格式的数据,用count
会更好。您将获得相同的信息,但格式很长。
library(dplyr)
library(tidyr)
Parents %>% pivot_longer(cols = A:E) %>% count(name, value)
同样适用于 Children
数据。
根据对 children 及其 parent 进行的一项调查,我有两个大桌子 (~12k x 6)。这些表的维度相同,types/classes,并以相同的方式处理到 R 中。经过一番争吵(同样,对 children 和 parents 做了同样的事情)我 运行 以下代码:
UPDATE:原来我的问题的根源是变量 C,它在 Children
数据集中只有值 0 和 1。将 summarise
与 table
一起使用时,有什么方法可以解决此错误?
Parents %>%
summarise(across(A, ~ table(.x)),
across(B, ~table(.x)),
across(C, ~ table(.x)),
across(D, ~ table(.x)),
across(E, ~ table(.x)))
Children %>%
summarise(across(A, ~ table(.x)),
across(B, ~table(.x)),
across(C, ~ table(.x)),
across(D, ~ table(.x)),
across(E, ~ table(.x)))
对于 Parents
我得到以下输出(唯一值的频率 D var (1,2,3), others (0,1,2):
A B C D E
1 11840 11835 11409 11363 519
2 35 42 436 473 4912
3 3 1 33 42 6447
对于 Children
我得到以下错误:
Error: Problem with `summarise()` input `..5`.
x Input `..5` must be size 4 or 1, not 3.
ℹ An earlier column had size 4.
ℹ Input `..5` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.
Run `rlang::last_error()` to see where the error occurred.
运行 rlang::last_error()
returns:
<error/dplyr_error>
Problem with `summarise()` input `..5`.
x Input `..5` must be size 4 or 1, not 3.
ℹ An earlier column had size 4.
ℹ Input `..5` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.
Backtrace:
Run `rlang::last_trace()` to see the full context.
运行 rlang::last_trace()
returns:
<error/dplyr_error>
Problem with `summarise()` input `..5`.
x Input `..5` must be size 4 or 1, not 3.
ℹ An earlier column had size 4.
ℹ Input `..5` is `(function (.cols = everything(), .fns = NULL, ..., .names = NULL) ...`.
Backtrace:
█
1. ├─`%>%`(...)
2. ├─dplyr::summarise(...)
3. ├─dplyr:::summarise.data.frame(...)
4. │ └─dplyr:::summarise_cols(.data, ...)
5. │ └─base::withCallingHandlers(...)
6. ├─dplyr:::abort_glue(...)
7. │ ├─rlang::exec(abort, class = class, !!!data)
8. │ └─(function (message = NULL, class = NULL, ..., trace = NULL, parent = NULL, ...
9. │ └─rlang:::signal_abort(cnd)
10. │ └─base::signalCondition(cnd)
11. └─(function (e) ...
有人知道会发生什么吗?
为了理智起见,这里是 str
摘要:
> str(Parents)
'data.frame': 11878 obs. of 6 variables:
$ ID : chr "Parent 1" "Parent 2" "Parent 3" "Parent 4" ...
$ A : num 0 0 0 0 0 0 0 0 0 0 ...
$ B : num 0 0 0 0 0 0 0 0 0 0 ...
$ C : num 0 0 0 0 0 0 0 0 0 0 ...
$ D : num 2 2 1 2 3 3 2 3 3 2 ...
$ E : num 0 0 0 0 0 0 0 0 0 0 ...
> str(Children)
'data.frame': 11878 obs. of 6 variables:
$ ID : chr "Child 1" "Child 2" "Child 3" "Child 4" ...
$ A : num 0 0 0 0 0 0 0 0 0 0 ...
$ B : num 0 0 0 0 0 0 0 0 0 0 ...
$ C : num 0 0 0 0 0 0 0 0 0 0 ...
$ D : num 2 2 1 2 3 3 2 3 3 2 ...
$ E : num 0 0 0 0 0 0 0 0 0 0 ...
table
不一定适合 tidyverse
管道,因为它 returns 值的数量不相等。我觉得获取长格式的数据,用count
会更好。您将获得相同的信息,但格式很长。
library(dplyr)
library(tidyr)
Parents %>% pivot_longer(cols = A:E) %>% count(name, value)
同样适用于 Children
数据。