计算跨多列的单行中 0 变为 1 的次数
Calculate the number of times a 0 changes to a 1 in a single row across multiple columns
我有跨多个站点和年份的存在-不存在数据,看起来像这样:
df <- tibble(Site = c("A","B","C","D","E"),
"1999"=c(0,NA,1,NA,1),
"2000"=c(1,NA,NA,0,1),
"2001"=c(NA,0,1,NA,0),
"2002"=c(NA,1,NA,1,0),
"2003"=c(0,NA,0,1,NA)
)
我想弄清楚如何计算 0 变为 1 的次数,反之亦然,并将它们放在数据帧末尾的列中。我还希望能够计算 1 可能 已更改为 0 但未更改为 0 的次数,反之亦然,并将这些总数放在单独的列中数据框。
我了解如何在数据框末尾添加列并获取所有行的汇总统计信息。例如,
## Group input by rows
rowwise(df)
## Add column called "0t1" (to contain the number of times a 0 changed to a 1) and sum across all columns starting with the "19" column, ignoring NAs
df %>% mutate("0t1" = sum(across(starts_with("19")),na.rm=T))
然而,这当然只是给我每行中值的数量的总和。
Site `1999` `2000` `2001` `2002` `2003` `0t1`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 0 1 NA NA 0 2
2 B NA NA 0 1 NA 2
3 C 1 NA 1 NA 0 2
4 D NA 0 NA 1 1 2
5 E 1 1 0 0 NA 2
我现在似乎无法弄清楚的是如何计算以先前非 NA 单元格中的值为条件的值,这会产生如下所示的内容:
Site `1999` `2000` `2001` `2002` `2003` `0t1` `1t0`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 0 1 NA NA 0 1 1
2 B NA NA 0 1 NA 1 0
3 C 1 NA 1 NA 0 0 1
4 D NA 0 NA 1 1 1 0
5 E 1 1 0 0 NA 0 1
在这里,我们可以使用lead
在删除每行中的NA
个元素后创建条件(na.omit
)。条件是检查当前值为0,下一个为1,取sum
apply(df[,-1], 1, function(x) {x1 <- na.omit(x); sum(x1 == 0 & lead(x1) == 1, na.rm = TRUE)})
或同dplyr
library(dplyr)
df %>%
rowwise %>%
mutate(t1 = {x1 <- na.omit(c_across(where(is.numeric)))
sum(x1 == 0 & lead(x1) ==1, na.rm = TRUE)
}) %>%
ungroup
-输出
# A tibble: 5 x 7
# Site `1999` `2000` `2001` `2002` `2003` t1
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#1 A 0 1 NA NA 0 1
#2 B NA NA 0 1 NA 1
#3 C 1 NA 1 NA 0 0
#4 D NA 0 NA 1 1 1
#5 E 1 1 0 0 NA 0
我有跨多个站点和年份的存在-不存在数据,看起来像这样:
df <- tibble(Site = c("A","B","C","D","E"),
"1999"=c(0,NA,1,NA,1),
"2000"=c(1,NA,NA,0,1),
"2001"=c(NA,0,1,NA,0),
"2002"=c(NA,1,NA,1,0),
"2003"=c(0,NA,0,1,NA)
)
我想弄清楚如何计算 0 变为 1 的次数,反之亦然,并将它们放在数据帧末尾的列中。我还希望能够计算 1 可能 已更改为 0 但未更改为 0 的次数,反之亦然,并将这些总数放在单独的列中数据框。
我了解如何在数据框末尾添加列并获取所有行的汇总统计信息。例如,
## Group input by rows
rowwise(df)
## Add column called "0t1" (to contain the number of times a 0 changed to a 1) and sum across all columns starting with the "19" column, ignoring NAs
df %>% mutate("0t1" = sum(across(starts_with("19")),na.rm=T))
然而,这当然只是给我每行中值的数量的总和。
Site `1999` `2000` `2001` `2002` `2003` `0t1`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 0 1 NA NA 0 2
2 B NA NA 0 1 NA 2
3 C 1 NA 1 NA 0 2
4 D NA 0 NA 1 1 2
5 E 1 1 0 0 NA 2
我现在似乎无法弄清楚的是如何计算以先前非 NA 单元格中的值为条件的值,这会产生如下所示的内容:
Site `1999` `2000` `2001` `2002` `2003` `0t1` `1t0`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A 0 1 NA NA 0 1 1
2 B NA NA 0 1 NA 1 0
3 C 1 NA 1 NA 0 0 1
4 D NA 0 NA 1 1 1 0
5 E 1 1 0 0 NA 0 1
在这里,我们可以使用lead
在删除每行中的NA
个元素后创建条件(na.omit
)。条件是检查当前值为0,下一个为1,取sum
apply(df[,-1], 1, function(x) {x1 <- na.omit(x); sum(x1 == 0 & lead(x1) == 1, na.rm = TRUE)})
或同dplyr
library(dplyr)
df %>%
rowwise %>%
mutate(t1 = {x1 <- na.omit(c_across(where(is.numeric)))
sum(x1 == 0 & lead(x1) ==1, na.rm = TRUE)
}) %>%
ungroup
-输出
# A tibble: 5 x 7
# Site `1999` `2000` `2001` `2002` `2003` t1
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
#1 A 0 1 NA NA 0 1
#2 B NA NA 0 1 NA 1
#3 C 1 NA 1 NA 0 0
#4 D NA 0 NA 1 1 1
#5 E 1 1 0 0 NA 0