计算每列频率 table 的平均值和中位数（每组长度 class）

Question

我每个位置的鱼的长度类的频率 table：

LK   Loc1  Loc2  Loc3    
1     13   22     0          
2     20   18     4          
3     12   21     2          
4     2     0     1          
5     1     2     0

我想分别计算每一列（位置）的平均值和中值。例如：Loc1：mean = (13 x 1)+(20 x 2)+(3 x 12)+(2 x 4)+(5 x 1)= 2.1 LK for地点 1.

我真的卡在这上面了，我不知道从哪里开始。有没有办法为每一列自动计算这个？提前谢谢你。

Answer 1

假设您的数据是 data.frame df，对于平均值

sapply(subset(df,select=-c(LK)),function(x){mean(x*df$LK)})

平均数和中位数

sapply(subset(df,select=-c(LK)),function(x){c(mean(x*df$LK),median(x*df$LK))})

但也许您正在搜索 LK 的加权平均值，每列包含权重，在这种情况下

sapply(subset(df,select=-c(LK)),function(x){weighted.mean(df$LK,x)})

Answer 2

这是一个tidyverse解决方案。

library(dplyr)
library(tidyr)

df1 %>%
  pivot_longer(-LK, names_to = "Loc") %>%
  group_by(Loc) %>%
  summarise(mean = mean(LK*value, na.rm = TRUE),
            median = median(LK*value, na.rm = TRUE),
            .groups = "drop")
## A tibble: 3 x 3
#  Loc    mean median
#  <chr> <dbl>  <int>
#1 Loc1   20.4     13
#2 Loc2   26.2     22
#3 Loc3    3.6      4

数据

df1 <- read.table(text = "
LK   Loc1  Loc2  Loc3    
1     13   22     0          
2     20   18     4          
3     12   21     2          
4     2     0     1
5     1     2     0
", header = TRUE)

Answer 3

将第一列乘以所有剩余列，然后使用 colMeans:

colMeans(df1$LK * df1[ -1 ])
# Loc1 Loc2 Loc3 
# 20.4 26.2  3.6

Answer 4

您可以使用 weighted.mean 来获取平均值

sapply(x[-1], weighted.mean, x=x[,1])
#    Loc1     Loc2     Loc3 
#2.125000 2.079365 2.571429

或使用proportions

colSums(proportions(as.matrix(x[-1]), 2) * x[,1])
#    Loc1     Loc2     Loc3 
#2.125000 2.079365 2.571429

和rep为中位数。

sapply(x[-1], function(y) median(rep(x[,1], y)))
#Loc1 Loc2 Loc3 
#   2    2    2

数据：

x <- read.table(header=TRUE, text="LK   Loc1  Loc2  Loc3    
1     13   22     0          
2     20   18     4          
3     12   21     2          
4     2     0     1          
5     1     2     0")

计算每列频率 table 的平均值和中位数（每组长度 class）

Calculate mean and median for a frequency table per column (length class per group)

r

mean

median

frequency-distribution