数据框中的按行列计数

Question

假设我有以下 dataframe

country_df <- tibble(
  population = c(328, 38, 30, 56, 1393, 126, 57),
  population2 = c(133, 12, 99, 83, 1033, 101, 33),
  population3 = c(89, 39, 33, 56, 193, 126, 58),
  pop = 45
)

我只需要在 mutate 函数中使用一种简洁的方法来获取大于每行中 pop 列值的列数（population 到 population3）。

所以我需要的是以下结果（更具体地说是 GreaterTotal 列）注意：我可以通过遍历每一列来获得答案，但如果列更多，则需要一段时间）

  population population2 population3   pop GreaterThan0 GreaterThan1 GreaterThan2 GreaterTotal
       <dbl>       <dbl>       <dbl> <dbl> <lgl>        <lgl>        <lgl>               <int>
1        328         133          89    45 TRUE         TRUE         TRUE                    3
2         38          12          39    45 FALSE        FALSE        FALSE                   0
3         30          99          33    45 FALSE        TRUE         FALSE                   1
4         56          83          56    45 TRUE         TRUE         TRUE                    3
5       1393        1033         193    45 TRUE         TRUE         TRUE                    3
6        126         101         126    45 TRUE         TRUE         TRUE                    3
7         57          33          58    45 TRUE         FALSE        TRUE                    2

我试过将 apply 与行索引一起使用，但我做不到。有人可以给我指出正确的方向吗？

Answer 1

您可以 select 'Population' 列并将这些列与 pop 进行比较，然后使用 rowSums 计算每行中有多少列更大。

cols <- grep('population', names(country_df))
country_df$GreaterTotal <- rowSums(country_df[cols] > country_df$pop)

#  population population2 population3   pop GreaterTotal
#       <dbl>       <dbl>       <dbl> <dbl>        <dbl>
#1        328         133          89    45            3
#2         38          12          39    45            0
#3         30          99          33    45            1
#4         56          83          56    45            3
#5       1393        1033         193    45            3
#6        126         101         126    45            3
#7         57          33          58    45            2

在 dplyr 1.0.0 中，您可以使用 rowwise 和 c_across :

country_df %>%
  rowwise() %>%
  mutate(GreaterTotal = sum(c_across(population:population3) > pop))

Answer 2

使用tidyverse，我们可以做到

library(dplyr)
country_df %>%
      mutate(GreaterTotal = rowSums(select(., 
              starts_with('population')) > .$pop) )

-输出

# A tibble: 7 x 5
#  population population2 population3   pop GreaterTotal
#       <dbl>       <dbl>       <dbl> <dbl>        <dbl>
#1        328         133          89    45            3
#2         38          12          39    45            0
#3         30          99          33    45            1
#4         56          83          56    45            3
#5       1393        1033         193    45            3
#6        126         101         126    45            3
#7         57          33          58    45            2

数据框中的按行列计数

Rowwise Column Count in Dataframe

r

dplyr

rowwise