如何计算具有非零值的范围内的列？

Question

我实际上是在尝试从 excel 复制 COUNTIF 函数。我有一个名为 filtered.data 的数据框，如下所示：

  Experiment_ID t_20_n_6 t_20_n_5 t_20_n_4 t_20_n_3 t_20_n_2 t_20_n_1
1  SG100520_social_01        0        0        0        0        2        1
2  K8012921_social_03        0        0        0        0        0        1
3  K8020521_social_01        0        0        0        1        1        1
4  K8020521_social_02        0        0        1        0        0        1
5  K8020521_social_03        0        0        0        0        2        3
6  K8020521_social_04        0        0        0        1        1        2
7  K8020521_social_05        0        0        0        1        1        3
8  K8021221_social_01        1        0        0        0        0        1
9  K8021221_social_03        0        0        0        0        0        2
10 K8021221_social_04        0        0        0        2        0        1

我需要计算 t_20_n_6:t_20_n_1 的某种平均值。我通过使用 x <- filtered.data %>% mutate(t_20_mean = ( (6*t_20_n_6)+(5*t_20_n_5)+(4*t_20_n_4)+(3*t_20_n_3)+(2*t_20_n_2)+(1*t_20_n_1) )\ ~~~~)

减少了总计部分

但我需要用 t_20_n_6:t_20_n_1 中的非零列数替换 ~~~~。

我试过 sum(x$t_10_n_6 != 0 | x$t_20_n_5 != 0 | x$t_20_n_4 != 0 | x$t_20_n_3 != 0 | x$t_20_n_2 !=0 | x$t_20_n_1 != 0 ) 但数字不合理。

结果应该是：

        Experiment_ID t_20_n_6 t_20_n_5 t_20_n_4 t_20_n_3 t_20_n_2 t_20_n_1 t_20_mean
1  SG100520_social_01        0        0        0        0        2        1         2.5
2  K8012921_social_03        0        0        0        0        0        1         1
3  K8020521_social_01        0        0        0        1        1        1         2
4  K8020521_social_02        0        0        1        0        0        1         2.5
5  K8020521_social_03        0        0        0        0        2        3         3.5
6  K8020521_social_04        0        0        0        1        1        2         2.33
7  K8020521_social_05        0        0        0        1        1        3         2.67
8  K8021221_social_01        1        0        0        0        0        1         3.5
9  K8021221_social_03        0        0        0        0        0        2         2
10 K8021221_social_04        0        0        0        2        0        1         3.5

Answer 1

如果您有兴趣使用列名中嵌入的数字（1 到 6）进行加权，也可以尝试这种方法。

使用pivot_longer以长格式放置数据。然后对于每个 Experiment_ID 您可以 sum 由列名提取的数字加权的值，然后除以大于零的值的数量。

library(tidyverse)

filtered.data %>%
  pivot_longer(cols = -Experiment_ID, 
               names_pattern = "t_20_n_(\d+)", 
               names_transform = list(name = as.integer)) %>%
  group_by(Experiment_ID) %>%
  summarise(t_20_mean = sum(name * value) / sum(value > 0))

输出

   Experiment_ID      t_20_mean
   <chr>                  <dbl>
 1 K8012921_social_03      1   
 2 K8020521_social_01      2   
 3 K8020521_social_02      2.5 
 4 K8020521_social_03      3.5 
 5 K8020521_social_04      2.33
 6 K8020521_social_05      2.67
 7 K8021221_social_01      3.5 
 8 K8021221_social_03      2   
 9 K8021221_social_04      3.5 
10 SG100520_social_01      2.5

如何计算具有非零值的范围内的列？

How to count columns in a range with nonzero values?

r

sum

mean

countif