在 R 中按条件聚合数据帧

Aggregate dataframe by condition in R

我在 R 中有以下 DataFrame:

 Y       ...    Price      Year           Quantity      Country    
010190   ...   4781       2021               4           Germany    
010190   ...   367        2021               3           Germany 
010190   ...   4781       2021               6           France    
010190   ...   250        2021               3           France    
020190   ...   690        2021               NA          USA        
020190   ...   10         2021               6           USA  
......         ...         ....              ..          ...   
217834  ...    56        2021                3           USA        
217834 ...     567       2021                9           USA        

如您所见,Y 列中的数字以 01.., 02..., 21... 开头,我想将此类行从 6 位聚合到通过考虑不同的分类列(例如 CountryYear)和数字列求和 Quantity价格。我还想在计算过程中考虑带有 NA 的行。所以,最后我想要这样的输出:

 Y     Price      Year          Quantity   Country
01     5148       2021           7         Germany
01     5031       2021           9          USA
02     700        2021           6          USA
..     ....       ...           ....        ...      
21     623        2021           12         USA

您可以使用 group_bysummarize 来自 dplyr

library(dplyr)

df %>% 
  mutate(Y = sprintf(as.numeric(factor(Y, unique(Y))), fmt = '%02d')) %>%
  group_by(Y, Year, Country) %>%
  summarize(across(where(is.numeric), sum))
#> # A tibble: 4 x 5
#> # Groups:   Y, Year [3]
#>    Y      Year Country Price Quantity
#>    <chr> <int> <chr>   <int>    <int>
#>  1 01     2021 France   5031        9
#>  2 01     2021 Germany  5148        7
#>  3 02     2021 USA       700       NA

更新:请求:

library(dplyr)
df %>% 
  mutate(Y = substr(Y, 1, 2)) %>% 
  group_by(Y, Year, Country) %>% 
  summarise(across(c(Price, Quantity), ~sum(., na.rm = TRUE)))

我们可以使用 substrYgroup_by 以及 summarise()sum()

中获取前两个字符
library(dplyr)
df %>% 
  mutate(Y = substr(Y, 1, 2)) %>% 
  group_by(Y, Year, Country) %>% 
  summarise(Price = sum(Price, na.rm = TRUE),
            Quantity = sum(Quantity, na.rm = TRUE)
            )
  Y      Year Country Price Quantity
  <chr> <dbl> <chr>   <dbl>    <dbl>
1 01     2021 France   5031        9
2 01     2021 Germany  5148        7
3 02     2021 USA       700        6
4 21     2021 USA       623       12

使用 aggregateYsubstring。

aggregate(cbind(Quantity, Price) ~ Y + Year + Country, 
          transform(dat, Y=substr(Y, 1, 2)), sum)
#    Y Year Country Quantity Price
# 1 10 2021  France        9  5031
# 2 10 2021 Germany        7  5148
# 3 20 2021     USA        7   700
# 4 21 2021     USA       12   623

数据:

dat <- structure(list(Y = c(10190L, 10190L, 10190L, 10190L, 20190L, 
20190L, 217834L, 217834L), foo = c("...", "...", "...", "...", 
"...", "...", "...", "..."), Price = c(4781L, 367L, 4781L, 250L, 
690L, 10L, 56L, 567L), Year = c(2021L, 2021L, 2021L, 2021L, 2021L, 
2021L, 2021L, 2021L), model = c(NA, NA, NA, NA, NA, NA, "Tesla", 
"Tesla"), Quantity = c(4L, 3L, 6L, 3L, 1L, 6L, 3L, 9L), Country = c("Germany", 
"Germany", "France", "France", "USA", "USA", "USA", "USA")), class = "data.frame", row.names = c(NA, 
-8L))