确定数据框中不同列的加权平均值

Question

我有一个数据框 Mesure，我希望为每一行确定一个像这样的加权平均值：

weighted_mean = ((mean_Mesure x nbr_Mesure) + (mean_DL x nbr_DL)) / (nbr_Mesure + nbr_DL)

我知道有一个 weighted.mean 函数，但我未能获得新列“weighted_mean”

而且，如果每一行不需要有 4 个值来获得这个公式（例如 Mesure 中的第 6 行），这是否是一个问题？

> head(Mesure)
         Row.names         mean_Mesure nbr_Mesure  mean_DL    nbr_DL
2    Aquatic_moss.BE-7     123            4         542        12
3   Aquatic_moss.CO-57     100            7         117        14         
4   Aquatic_moss.CO-58     120            5         145        12           
5   Aquatic_moss.CO-60     140            5         153        12 
6  Aquatic_moss.CS-134                              146        15

Answer 1

在您的情况下，您可以像使用公式一样使用每行的加权平均值，例如：

with(Mesure, ((mean_Mesure * nbr_Mesure) + (mean_DL * nbr_DL)) / (nbr_Mesure + nbr_DL))
#[1] 437.2500 111.3333 137.6471 149.1765       NA

如果存在缺失值，它将 return NA。如果 NA 是 0，您可以将其设置为 0:

Mesure[is.na(Mesure)] <- 0

给出了什么：

#[1] 437.2500 111.3333 137.6471 149.1765 146.0000

Answer 2

您可以在新的dplyr中使用rowwise()函数：

library(dplyr) # 1.0.0

Mesure %>%
        rowwise() %>%
        mutate(weighted.mean = ((mean_Mesure * nbr_Mesure) + (mean_DL * nbr_DL)) / (nbr_Mesure + nbr_DL))

# A tibble: 5 x 6
# Rowwise: 
  Row.names           mean_Mesure nbr_Mesure mean_DL nbr_DL weighted.mean
  <chr>                     <dbl>      <dbl>   <dbl>  <dbl>         <dbl>
1 Aquatic_moss.BE-7           123          4     542     12          437.
2 Aquatic_moss.CO-57          100          7     117     14          111.
3 Aquatic_moss.CO-58          120          5     145     12          138.
4 Aquatic_moss.CO-60          140          5     153     12          149.
5 Aquatic_moss.CS-134          NA         NA     146     15           NA

编辑

如果我们想用0替换NAs，那么我们可以使用tidyr中的na_replace()函数：

library(dplyr)
library(tidyr) # 1.1.0

Mesure %>%
        replace_na(list(mean_Mesure = 0,
                        nbr_Mesure = 0,
                        mean_DL = 0,
                        nbr_DL = 0)) %>%
        rowwise() %>%
        mutate(weighted.mean = ((mean_Mesure * nbr_Mesure) + (mean_DL * nbr_DL)) / (nbr_Mesure + nbr_DL))

# A tibble: 5 x 6
# Rowwise: 
  Row.names           mean_Mesure nbr_Mesure mean_DL nbr_DL weighted.mean
  <chr>                     <dbl>      <dbl>   <dbl>  <dbl>         <dbl>
1 Aquatic_moss.BE-7           123          4     542     12          437.
2 Aquatic_moss.CO-57          100          7     117     14          111.
3 Aquatic_moss.CO-58          120          5     145     12          138.
4 Aquatic_moss.CO-60          140          5     153     12          149.
5 Aquatic_moss.CS-134           0          0     146     15          146

数据

Mesure <- structure(list(Row.names = c("Aquatic_moss.BE-7", "Aquatic_moss.CO-57", 
"Aquatic_moss.CO-58", "Aquatic_moss.CO-60", "Aquatic_moss.CS-134"
), mean_Mesure = c(123, 100, 120, 140, NA), nbr_Mesure = c(4, 
7, 5, 5, NA), mean_DL = c(542, 117, 145, 153, 146), nbr_DL = c(12, 
14, 12, 12, 15)), row.names = c(NA, -5L), class = c("tbl_df", 
"tbl", "data.frame"))

Answer 3

您也可以使用mapply。这样您就可以使用通用函数并将任何列传递给它：

df <- read.table(text = "
                         Row.names         mean_Mesure nbr_Mesure  mean_DL    nbr_DL
2 Aquatic_moss.BE-7     123            4         542        12
3 Aquatic_moss.CO-57     100            7         117        14         
4 Aquatic_moss.CO-58     120            5         145        12           
5 Aquatic_moss.CO-60     140            5         153        12 
6 Aquatic_moss.CS-134   NA            NA         146        15 ")


df$mean_Mesure[is.na(df$mean_Mesure)] <- 0
df$nbr_Mesure[is.na(df$nbr_Mesure)] <- 0

df$weighted.mean <- mapply(function(x1,x2,x3,x4) (x1*x2 + x3*x4)/(x2+x4), df$mean_Mesure, df$nbr_Mesure,  df$mean_DL, df$nbr_DL)

输出

Row.names mean_Mesure nbr_Mesure mean_DL nbr_DL weighted.mean
2   Aquatic_moss.BE-7         123          4     542     12      437.2500
3  Aquatic_moss.CO-57         100          7     117     14      111.3333
4  Aquatic_moss.CO-58         120          5     145     12      137.6471
5  Aquatic_moss.CO-60         140          5     153     12      149.1765
6 Aquatic_moss.CS-134           0          0     146     15      146.0000

确定数据框中不同列的加权平均值

Determine the weighted mean of different columns in a data frame

r

weighted-average