数据整理——根据其他列的信息计算变量中某些行的平均值
Data wrangling - calculate average of certain rows in a variable based on other column's information
我想根据另一个变量计算列中某些行的平均值。 dfin
是原来的df。我想创建一个像 dfout
这样的 df
dfin <- data.frame(c1 = c("a1","a1","a1","a1","a1","a1","a2","a2","a2","a2","a2","a2","a3","a3","a3","a3","a3","a3"),
c2 = c("b1","b1","b2","b2","b3","b3","b4","b4","b5","b5","b6","b6","b7","b7","b8","b8","b9","b9"),
c3 = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18))
dfout <- data.frame(c1 = c("a1","a1","a1","a2","a2","a2","a3","a3","a3"),
c2 = c("b1","b2","b3","b4","b5","b6","b7","b8","b9"),
c3 = c(1.5,3.5,5.5,7.5,9.5,11.5,13.5,15.5,17.5))
我想根据c2的信息计算c3中行的平均值。 dfin
有三列 c1
、c2
和 c3
c1 有 a1, a2, a3
,c2 有 b1, b2, b3, to b9
,c3
包含值。
正如在 dfout
中所见,我想创建一个新的 df,它已根据 c2 组计算了 c3 中的平均值。同时保留 c1 信息。
如有任何帮助,我们将不胜感激。
你想要这个吗?
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
dfin <- data.frame(c1 = c("a1","a1","a1","a1","a1","a1","a2","a2","a2","a2","a2","a2","a3","a3","a3","a3","a3","a3"),
c2 = c("b1","b1","b2","b2","b3","b3","b4","b4","b5","b5","b6","b6","b7","b7","b8","b8","b9","b9"),
c3 = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18))
dfin %>%
group_by(c1, c2) %>%
summarise(c3 = mean(c3), .groups = 'drop')
#> # A tibble: 9 x 3
#> c1 c2 c3
#> <chr> <chr> <dbl>
#> 1 a1 b1 1.5
#> 2 a1 b2 3.5
#> 3 a1 b3 5.5
#> 4 a2 b4 7.5
#> 5 a2 b5 9.5
#> 6 a2 b6 11.5
#> 7 a3 b7 13.5
#> 8 a3 b8 15.5
#> 9 a3 b9 17.5
由 reprex package (v2.0.1)
创建于 2022-01-19
我想根据另一个变量计算列中某些行的平均值。 dfin
是原来的df。我想创建一个像 dfout
dfin <- data.frame(c1 = c("a1","a1","a1","a1","a1","a1","a2","a2","a2","a2","a2","a2","a3","a3","a3","a3","a3","a3"),
c2 = c("b1","b1","b2","b2","b3","b3","b4","b4","b5","b5","b6","b6","b7","b7","b8","b8","b9","b9"),
c3 = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18))
dfout <- data.frame(c1 = c("a1","a1","a1","a2","a2","a2","a3","a3","a3"),
c2 = c("b1","b2","b3","b4","b5","b6","b7","b8","b9"),
c3 = c(1.5,3.5,5.5,7.5,9.5,11.5,13.5,15.5,17.5))
我想根据c2的信息计算c3中行的平均值。 dfin
有三列 c1
、c2
和 c3
c1 有 a1, a2, a3
,c2 有 b1, b2, b3, to b9
,c3
包含值。
正如在 dfout
中所见,我想创建一个新的 df,它已根据 c2 组计算了 c3 中的平均值。同时保留 c1 信息。
如有任何帮助,我们将不胜感激。
你想要这个吗?
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
dfin <- data.frame(c1 = c("a1","a1","a1","a1","a1","a1","a2","a2","a2","a2","a2","a2","a3","a3","a3","a3","a3","a3"),
c2 = c("b1","b1","b2","b2","b3","b3","b4","b4","b5","b5","b6","b6","b7","b7","b8","b8","b9","b9"),
c3 = c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18))
dfin %>%
group_by(c1, c2) %>%
summarise(c3 = mean(c3), .groups = 'drop')
#> # A tibble: 9 x 3
#> c1 c2 c3
#> <chr> <chr> <dbl>
#> 1 a1 b1 1.5
#> 2 a1 b2 3.5
#> 3 a1 b3 5.5
#> 4 a2 b4 7.5
#> 5 a2 b5 9.5
#> 6 a2 b6 11.5
#> 7 a3 b7 13.5
#> 8 a3 b8 15.5
#> 9 a3 b9 17.5
由 reprex package (v2.0.1)
创建于 2022-01-19