聚合 R(绝对)差异

Aggregate R (absolute) difference

我有一个 dataframe 这样的:

structure(list(from = c("China", "China", "Canada", "Canada", 
"USA", "China", "Trinidad and Tobago", "China", "USA", "USA"), 
    to = c("Japan", "Japan", "USA", "USA", "Japan", "USA", "USA", 
    "Rep. of Korea", "Canada", "Japan"), weight = c(4766781396, 
    4039683737, 3419468319, 3216051707, 2535151299, 2513604035, 
    2303474559, 2096033823, 2091906420, 2066357443)), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"), row.names = c(NA, -10L), groups = structure(list(
    from = c("Canada", "China", "China", "China", "Trinidad and Tobago", 
    "USA", "USA"), to = c("USA", "Japan", "Rep. of Korea", "USA", 
    "USA", "Canada", "Japan"), .rows = structure(list(3:4, 1:2, 
        8L, 6L, 7L, 9L, c(5L, 10L)), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -7L), .drop = TRUE))

我想计算按 fromto 分组的 weight 列中差异的绝对值。

我正在尝试使用函数 aggregate(),但它似乎适用于平均值和总和,而不适用于差异。例如(df是我的dataframe的名字):

aggregate(weight~from+to, data = df, FUN=mean)

产生:

                 from            to     weight
1                 USA        Canada 2091906420
2               China         Japan 4403232567
3                 USA         Japan 2300754371
4               China Rep. of Korea 2096033823
5              Canada           USA 3317760013
6               China           USA 2513604035
7 Trinidad and Tobago           USA 2303474559

编辑。 期望的结果是

                 from            to     weight
1                 USA        Canada 2091906420
2               China         Japan 727097659
3                 USA         Japan 468793856
4               China Rep. of Korea 2096033823
5              Canada           USA 203416612
6               China           USA 2513604035
7 Trinidad and Tobago           USA 2303474559

正如我们所见,在 fromto 列中出现两次的国家在 weight 列中仅在一行中出现权重差异。例如,

from            to            weight
China           Japan         4766781396
China           Japan         4039683737

成为

from            to            weight
China           Japan         727097659

因为

> 4766781396-4039683737
[1] 727097659

差异应该是正的(这就是为什么我写“权重差异的绝对值”)。

仅出现在一行数据框中的国家对 df 保持不变,例如

                 from            to     weight
7 Trinidad and Tobago           USA 2303474559

以下是您要找的吗?

f <- function(x) abs(x[2] - x[1])
aggregate(weight ~ from + to, data = df, FUN = f)

#>                  from            to    weight
#> 1                 USA        Canada        NA
#> 2               China         Japan 727097659
#> 3                 USA         Japan 468793856
#> 4               China Rep. of Korea        NA
#> 5              Canada           USA 203416612
#> 6               China           USA        NA
#> 7 Trinidad and Tobago           USA        NA

假设每组最多 2 个值,并且差异的顺序不重要

aggregate(weight~from+to, data=df, FUN=function(x){
  abs(ifelse(length(x)==1,x,diff(x)))
})

                 from            to     weight
1                 USA        Canada 2091906420
2               China         Japan  727097659
3                 USA         Japan  468793856
4               China Rep. of Korea 2096033823
5              Canada           USA  203416612
6               China           USA 2513604035
7 Trinidad and Tobago           USA 2303474559