数据整理——根据其他列的信息计算变量中某些行的平均值

Data wrangling - calculate average of certain rows in a variable based on other column's information

我想根据另一个变量计算列中某些行的平均值。 dfin是原来的df。我想创建一个像 dfout

这样的 df
dfin <- data.frame(c1 = c("a1","a1","a1","a1","a1","a1","a2","a2","a2","a2","a2","a2","a3","a3","a3","a3","a3","a3"),
                 c2 = c("b1","b1","b2","b2","b3","b3","b4","b4","b5","b5","b6","b6","b7","b7","b8","b8","b9","b9"),
                 c3 = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18))

dfout <- data.frame(c1 = c("a1","a1","a1","a2","a2","a2","a3","a3","a3"),
                 c2 = c("b1","b2","b3","b4","b5","b6","b7","b8","b9"),
                 c3 = c(1.5,3.5,5.5,7.5,9.5,11.5,13.5,15.5,17.5))

我想根据c2的信息计算c3中行的平均值。 dfin 有三列 c1c2c3

c1 有 a1, a2, a3,c2 有 b1, b2, b3, to b9c3 包含值。

正如在 dfout 中所见,我想创建一个新的 df,它已根据 c2 组计算了 c3 中的平均值。同时保留 c1 信息。

如有任何帮助,我们将不胜感激。

你想要这个吗?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

dfin <- data.frame(c1 = c("a1","a1","a1","a1","a1","a1","a2","a2","a2","a2","a2","a2","a3","a3","a3","a3","a3","a3"),
                   c2 = c("b1","b1","b2","b2","b3","b3","b4","b4","b5","b5","b6","b6","b7","b7","b8","b8","b9","b9"),
                   c3 = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18))

dfin %>% 
  group_by(c1, c2) %>% 
  summarise(c3 = mean(c3), .groups = 'drop')
#> # A tibble: 9 x 3
#>   c1    c2       c3
#>   <chr> <chr> <dbl>
#> 1 a1    b1      1.5
#> 2 a1    b2      3.5
#> 3 a1    b3      5.5
#> 4 a2    b4      7.5
#> 5 a2    b5      9.5
#> 6 a2    b6     11.5
#> 7 a3    b7     13.5
#> 8 a3    b8     15.5
#> 9 a3    b9     17.5

reprex package (v2.0.1)

创建于 2022-01-19