计算跨多列的单行中 0 变为 1 的次数

Calculate the number of times a 0 changes to a 1 in a single row across multiple columns

我有跨多个站点和年份的存在-不存在数据,看起来像这样:

df <- tibble(Site = c("A","B","C","D","E"), 
                "1999"=c(0,NA,1,NA,1),
                "2000"=c(1,NA,NA,0,1),
                "2001"=c(NA,0,1,NA,0),
                "2002"=c(NA,1,NA,1,0),
                "2003"=c(0,NA,0,1,NA)
                 )

我想弄清楚如何计算 0 变为 1 的次数,反之亦然,并将它们放在数据帧末尾的列中。我还希望能够计算 1 可能 已更改为 0 但未更改为 0 的次数,反之亦然,并将这些总数放在单独的列中数据框。

我了解如何在数据框末尾添加列并获取所有行的汇总统计信息。例如,

## Group input by rows
rowwise(df)

## Add column called "0t1" (to contain the number of times a 0 changed to a 1) and sum across all columns starting with the "19" column, ignoring NAs
df %>%  mutate("0t1" = sum(across(starts_with("19")),na.rm=T))

然而,这当然只是给我每行中值的数量的总和。

  Site  `1999` `2000` `2001` `2002` `2003` `0t1`
  <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <dbl>
1 A          0      1     NA     NA      0     2
2 B         NA     NA      0      1     NA     2
3 C          1     NA      1     NA      0     2
4 D         NA      0     NA      1      1     2
5 E          1      1      0      0     NA     2

我现在似乎无法弄清楚的是如何计算以先前非 NA 单元格中的值为条件的值,这会产生如下所示的内容:

  Site  `1999` `2000` `2001` `2002` `2003` `0t1`  `1t0`
  <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <dbl>  <dbl>
1 A          0      1     NA     NA      0     1    1
2 B         NA     NA      0      1     NA     1    0
3 C          1     NA      1     NA      0     0    1
4 D         NA      0     NA      1      1     1    0
5 E          1      1      0      0     NA     0    1

在这里,我们可以使用lead在删除每行中的NA个元素后创建条件(na.omit)。条件是检查当前值为0,下一个为1,取sum

 apply(df[,-1], 1, function(x) {x1 <- na.omit(x); sum(x1 == 0 & lead(x1) == 1, na.rm = TRUE)})

或同dplyr

library(dplyr)
df %>%
     rowwise %>%
     mutate(t1 = {x1 <- na.omit(c_across(where(is.numeric)))
                  sum(x1 == 0 & lead(x1) ==1, na.rm = TRUE)
              }) %>%
     ungroup

-输出

# A tibble: 5 x 7
#  Site  `1999` `2000` `2001` `2002` `2003`    t1
#  <chr>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl> <int>
#1 A          0      1     NA     NA      0     1
#2 B         NA     NA      0      1     NA     1
#3 C          1     NA      1     NA      0     0
#4 D         NA      0     NA      1      1     1
#5 E          1      1      0      0     NA     0