聚合 R(绝对)差异
Aggregate R (absolute) difference
我有一个 dataframe
这样的:
structure(list(from = c("China", "China", "Canada", "Canada",
"USA", "China", "Trinidad and Tobago", "China", "USA", "USA"),
to = c("Japan", "Japan", "USA", "USA", "Japan", "USA", "USA",
"Rep. of Korea", "Canada", "Japan"), weight = c(4766781396,
4039683737, 3419468319, 3216051707, 2535151299, 2513604035,
2303474559, 2096033823, 2091906420, 2066357443)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -10L), groups = structure(list(
from = c("Canada", "China", "China", "China", "Trinidad and Tobago",
"USA", "USA"), to = c("USA", "Japan", "Rep. of Korea", "USA",
"USA", "Canada", "Japan"), .rows = structure(list(3:4, 1:2,
8L, 6L, 7L, 9L, c(5L, 10L)), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -7L), .drop = TRUE))
我想计算按 from
和 to
分组的 weight
列中差异的绝对值。
我正在尝试使用函数 aggregate()
,但它似乎适用于平均值和总和,而不适用于差异。例如(df
是我的dataframe
的名字):
aggregate(weight~from+to, data = df, FUN=mean)
产生:
from to weight
1 USA Canada 2091906420
2 China Japan 4403232567
3 USA Japan 2300754371
4 China Rep. of Korea 2096033823
5 Canada USA 3317760013
6 China USA 2513604035
7 Trinidad and Tobago USA 2303474559
编辑。 期望的结果是
from to weight
1 USA Canada 2091906420
2 China Japan 727097659
3 USA Japan 468793856
4 China Rep. of Korea 2096033823
5 Canada USA 203416612
6 China USA 2513604035
7 Trinidad and Tobago USA 2303474559
正如我们所见,在 from
和 to
列中出现两次的国家在 weight
列中仅在一行中出现权重差异。例如,
from to weight
China Japan 4766781396
China Japan 4039683737
成为
from to weight
China Japan 727097659
因为
> 4766781396-4039683737
[1] 727097659
差异应该是正的(这就是为什么我写“权重差异的绝对值”)。
仅出现在一行数据框中的国家对 df
保持不变,例如
from to weight
7 Trinidad and Tobago USA 2303474559
以下是您要找的吗?
f <- function(x) abs(x[2] - x[1])
aggregate(weight ~ from + to, data = df, FUN = f)
#> from to weight
#> 1 USA Canada NA
#> 2 China Japan 727097659
#> 3 USA Japan 468793856
#> 4 China Rep. of Korea NA
#> 5 Canada USA 203416612
#> 6 China USA NA
#> 7 Trinidad and Tobago USA NA
假设每组最多 2 个值,并且差异的顺序不重要
aggregate(weight~from+to, data=df, FUN=function(x){
abs(ifelse(length(x)==1,x,diff(x)))
})
from to weight
1 USA Canada 2091906420
2 China Japan 727097659
3 USA Japan 468793856
4 China Rep. of Korea 2096033823
5 Canada USA 203416612
6 China USA 2513604035
7 Trinidad and Tobago USA 2303474559
我有一个 dataframe
这样的:
structure(list(from = c("China", "China", "Canada", "Canada",
"USA", "China", "Trinidad and Tobago", "China", "USA", "USA"),
to = c("Japan", "Japan", "USA", "USA", "Japan", "USA", "USA",
"Rep. of Korea", "Canada", "Japan"), weight = c(4766781396,
4039683737, 3419468319, 3216051707, 2535151299, 2513604035,
2303474559, 2096033823, 2091906420, 2066357443)), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -10L), groups = structure(list(
from = c("Canada", "China", "China", "China", "Trinidad and Tobago",
"USA", "USA"), to = c("USA", "Japan", "Rep. of Korea", "USA",
"USA", "Canada", "Japan"), .rows = structure(list(3:4, 1:2,
8L, 6L, 7L, 9L, c(5L, 10L)), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -7L), .drop = TRUE))
我想计算按 from
和 to
分组的 weight
列中差异的绝对值。
我正在尝试使用函数 aggregate()
,但它似乎适用于平均值和总和,而不适用于差异。例如(df
是我的dataframe
的名字):
aggregate(weight~from+to, data = df, FUN=mean)
产生:
from to weight
1 USA Canada 2091906420
2 China Japan 4403232567
3 USA Japan 2300754371
4 China Rep. of Korea 2096033823
5 Canada USA 3317760013
6 China USA 2513604035
7 Trinidad and Tobago USA 2303474559
编辑。 期望的结果是
from to weight
1 USA Canada 2091906420
2 China Japan 727097659
3 USA Japan 468793856
4 China Rep. of Korea 2096033823
5 Canada USA 203416612
6 China USA 2513604035
7 Trinidad and Tobago USA 2303474559
正如我们所见,在 from
和 to
列中出现两次的国家在 weight
列中仅在一行中出现权重差异。例如,
from to weight
China Japan 4766781396
China Japan 4039683737
成为
from to weight
China Japan 727097659
因为
> 4766781396-4039683737
[1] 727097659
差异应该是正的(这就是为什么我写“权重差异的绝对值”)。
仅出现在一行数据框中的国家对 df
保持不变,例如
from to weight
7 Trinidad and Tobago USA 2303474559
以下是您要找的吗?
f <- function(x) abs(x[2] - x[1])
aggregate(weight ~ from + to, data = df, FUN = f)
#> from to weight
#> 1 USA Canada NA
#> 2 China Japan 727097659
#> 3 USA Japan 468793856
#> 4 China Rep. of Korea NA
#> 5 Canada USA 203416612
#> 6 China USA NA
#> 7 Trinidad and Tobago USA NA
假设每组最多 2 个值,并且差异的顺序不重要
aggregate(weight~from+to, data=df, FUN=function(x){
abs(ifelse(length(x)==1,x,diff(x)))
})
from to weight
1 USA Canada 2091906420
2 China Japan 727097659
3 USA Japan 468793856
4 China Rep. of Korea 2096033823
5 Canada USA 203416612
6 China USA 2513604035
7 Trinidad and Tobago USA 2303474559