为什么这些操作不会产生相同的结果?管道进入。 (点)
Why do these operations not yield the same result? Piping into . (dot)
我今天 运行 在使用 .
和 %>%
时遇到了一些我不太明白的东西。现在我不确定我是否理解这两个运算符。
数据
set.seed(1)
df <- setDT(data.frame(id = sample(1:5, 10, replace = T), value = runif(10)))
为什么这三个是等价的
df[, .(Mean = mean(value)), by = .(id)] %>% .$Mean %>% sum()
[1] 3.529399
df[, .(Mean = mean(value)), by = .(id)] %>% {sum(.$Mean)}
[1] 3.529399
sum(df[, .(Mean = mean(value)), by = .(id)]$Mean)
[1] 3.529399
但这个答案如此不同?
df[, .(Mean = mean(value)), by = .(id)] %>% sum(.$Mean)
[1] 22.0588
有人可以向我解释一下管道运算符实际上是如何工作的 w.r.t 到 .
用法。我以前的思路是 Go fetch what sits on the left of the %>%
.
让我更加困惑的调查
我尝试用 print
替换 sum
以查看实际发生的情况
# As Expected
df[, .(Mean = mean(value)), by = .(id)] %>% .$Mean %>% print()
[1] 0.5111589 0.7698414 0.7475319 0.9919061 0.5089610
df[, .(Mean = mean(value)), by = .(id)] %>% print(.$Mean) %>% sum()
[1] 3.529399
# Surprised
df[, .(Mean = mean(value)), by = .(id)] %>% print(.$Mean)
id Mean
1: 1 0.5111589
---
5: 3 0.5089610
# Same
df[, .(Mean = mean(value)), by = .(id)] %>% sum(print(.$Mean))
[1] 22.0588
# Utterly Confused
df[, .(Mean = mean(value)), by = .(id)] %>% print(.$Mean) %>% sum()
[1] 18.5294 #Not even the same as above??
编辑: 看起来与 data.table 或它的分组方式无关,与 [=37 同样的问题=]data.frame:
x <- data.frame(x1 = 1:3, x2 = 4:6)
sum(x$x1)
# [1] 6
sum(x$x2)
# [1] 15
x %>% .$x1 %>% sum
# [1] 6
x %>% .$x2 %>% sum
# [1] 15
# Why?
x %>% sum(.$x1)
# [1] 27
x %>% sum(.$x2)
# [1] 36
更新的简短示例有帮助。
我们知道在使用管道时,第一个参数来自 LHS
(除非我们通过 {}
"stop")所以发生的是:
x %>% sum(.$x1)
#[1] 27
相当于
sum(x, x$x1)
#[1] 27
数据帧的完整总和添加到列x1
。
就原始示例而言,我们可以验证相同的行为
library(data.table)
temp <- df[, .(Mean = mean(value)), by = .(id)]
sum(temp, temp$Mean)
#[1] 22.0588
我今天 运行 在使用 .
和 %>%
时遇到了一些我不太明白的东西。现在我不确定我是否理解这两个运算符。
数据
set.seed(1)
df <- setDT(data.frame(id = sample(1:5, 10, replace = T), value = runif(10)))
为什么这三个是等价的
df[, .(Mean = mean(value)), by = .(id)] %>% .$Mean %>% sum()
[1] 3.529399
df[, .(Mean = mean(value)), by = .(id)] %>% {sum(.$Mean)}
[1] 3.529399
sum(df[, .(Mean = mean(value)), by = .(id)]$Mean)
[1] 3.529399
但这个答案如此不同?
df[, .(Mean = mean(value)), by = .(id)] %>% sum(.$Mean)
[1] 22.0588
有人可以向我解释一下管道运算符实际上是如何工作的 w.r.t 到 .
用法。我以前的思路是 Go fetch what sits on the left of the %>%
.
让我更加困惑的调查
我尝试用 print
替换 sum
以查看实际发生的情况
# As Expected
df[, .(Mean = mean(value)), by = .(id)] %>% .$Mean %>% print()
[1] 0.5111589 0.7698414 0.7475319 0.9919061 0.5089610
df[, .(Mean = mean(value)), by = .(id)] %>% print(.$Mean) %>% sum()
[1] 3.529399
# Surprised
df[, .(Mean = mean(value)), by = .(id)] %>% print(.$Mean)
id Mean
1: 1 0.5111589
---
5: 3 0.5089610
# Same
df[, .(Mean = mean(value)), by = .(id)] %>% sum(print(.$Mean))
[1] 22.0588
# Utterly Confused
df[, .(Mean = mean(value)), by = .(id)] %>% print(.$Mean) %>% sum()
[1] 18.5294 #Not even the same as above??
编辑: 看起来与 data.table 或它的分组方式无关,与 [=37 同样的问题=]data.frame:
x <- data.frame(x1 = 1:3, x2 = 4:6)
sum(x$x1)
# [1] 6
sum(x$x2)
# [1] 15
x %>% .$x1 %>% sum
# [1] 6
x %>% .$x2 %>% sum
# [1] 15
# Why?
x %>% sum(.$x1)
# [1] 27
x %>% sum(.$x2)
# [1] 36
更新的简短示例有帮助。
我们知道在使用管道时,第一个参数来自 LHS
(除非我们通过 {}
"stop")所以发生的是:
x %>% sum(.$x1)
#[1] 27
相当于
sum(x, x$x1)
#[1] 27
数据帧的完整总和添加到列x1
。
就原始示例而言,我们可以验证相同的行为
library(data.table)
temp <- df[, .(Mean = mean(value)), by = .(id)]
sum(temp, temp$Mean)
#[1] 22.0588