如何将多行中的值求和到 R 中的新列?

How to sum values in multiple rows to a new column in R?

我的数据框:

structure(list(Observation = c("Apple", "Blueberry", "Cirtus", 
"Dates", "Eggplant"), Topic = 1:5, Gamma = c(0.1, 0.1, 0.2, 0.2, 
0.1)), class = "data.frame", row.names = c(NA, -5L))

  Observation Topic Gamma
1       Apple     1   0.1
2   Blueberry     2   0.1
3      Cirtus     3   0.2
4       Dates     4   0.2
5    Eggplant     5   0.1

如何告诉 R 将 1、3 和 5 以及 2 和 4 的值相加,然后将其保存在新列中?例如:

Observation Topic Gamma new variable
Apple 1 .10 .40
Blueberry 2 .10 .30
Cirtus 3 .20 .40
Dates 4 .20 .30
Eggplant 5 .10 .40

基本上,我希望每个观察结果都有一个新值,用于总结主题 1、3 和 5 以及主题 2 和 4 的伽玛分数。

更新:澄清: 我不想添加偶数主题编号或奇数主题编号。有时它会是两者的混合体。以这个新的 table 为例:

Observation Topic Gamma new variable
Apple 1 .10 .10
Blueberry 2 .10 .70
Cirtus 3 .20 .40
Dates 4 .20 .40
Eggplant 5 .10 .70
Fruits 6 .50 .70

在这个例子中,我保留了主题 1,添加了主题 2、5 和 6,并添加了主题 3 和 4。

更新:澄清:

Observation Topic Gamma new variable
Apple 1 .10 .10
Apple 2 .10 .70
Apple 3 .20 .40
Apple 4 .20 .40
Apple 5 .10 .70
Apple 6 .50 .70
Blueberry 1 .20 .20
Blueberry 2 .10 .60
Blueberry 3 .30 .80
Blueberry 4 .50 .80
Blueberry 5 .40 .60
Blueberry 6 .10 .60

在这个例子中,每个水果(观察)对每个主题都有自己的一组值,我对每个水果的上面列出的相同主题(2、5、6、3 和 4)求和。

根据新请求更新 II

library(dplyr)

df %>% 
  group_by(Observation, grp = case_when(Topic %in% 1 ~ 1,
                           Topic %in% c(2,5,6) ~ 2,
                           Topic %in% c(3,4) ~ 3)) %>% 
  mutate(new_variable = sum(Gamma)) %>% 
  ungroup %>% 
  select(-grp)
  Observation Topic Gamma new_variable
   <chr>       <int> <dbl>        <dbl>
 1 Apple           1   0.1          0.1
 2 Apple           2   0.1          0.7
 3 Apple           3   0.2          0.4
 4 Apple           4   0.2          0.4
 5 Apple           5   0.1          0.7
 6 Apple           6   0.5          0.7
 7 Blueberry       1   0.2          0.2
 8 Blueberry       2   0.1          0.6
 9 Blueberry       3   0.3          0.8
10 Blueberry       4   0.5          0.8
11 Blueberry       5   0.4          0.6
12 Blueberry       6   0.1          0.6

更新: 根据 OP 的新请求。该解决方案的灵感完全来自 PaulS 解决方案(归功于他):

library(dplyr)

df %>% 
  group_by(grp = case_when(Topic %in% 1 ~ 1,
                           Topic %in% c(2,5,6) ~ 2,
                           Topic %in% c(3,4) ~ 3)) %>% 
  mutate(new_variable = sum(Gamma)) %>% 
  ungroup %>% 
  select(-grp)
  Observation Topic Gamma new_variable
  <chr>       <int> <dbl>        <dbl>
1 Apple           1   0.1          0.1
2 Blueberry       2   0.1          0.7
3 Cirtus          3   0.2          0.4
4 Dates           4   0.2          0.4
5 Eggplant        5   0.1          0.7
6 Fruits          6   0.5          0.7

第一个回答: 在 ifelse 语句中识别奇数行和偶数行后,我们可以对 Gamma 求和: 在这种情况下 row_number 可以替换为 Topic

library(dplyr)

df %>% 
  mutate(new_variable = ifelse(row_number() %% 2 == 1, 
                               sum(Gamma[row_number() %% 2 == 1]), # odd 1,3,5
                               sum(Gamma[row_number() %% 2 == 0])) # even 2,4
         )
  Observation Topic Gamma new_variable
1       Apple     1   0.1          0.4
2   Blueberry     2   0.1          0.3
3      Cirtus     3   0.2          0.4
4       Dates     4   0.2          0.3
5    Eggplant     5   0.1          0.4

数据:

structure(list(Observation = c("Apple", "Blueberry", "Cirtus", 
"Dates", "Eggplant"), Topic = 1:5, Gamma = c(0.1, 0.1, 0.2, 0.2, 
0.1)), class = "data.frame", row.names = c(NA, -5L))

Microbenchmark:AndrewGB 的基础 R 最快

这应该可以做到。

dat <- structure(list(Observation = c("Apple", "Blueberry", "Cirtus", 
                                 "Dates", "Eggplant"), 
                 Topic = 1:5, Gamma = c(0.1, 0.1, 0.2, 0.2, 0.1)), 
            row.names = c(NA, 5L), class = "data.frame")
library(tidyverse)
dat %>% 
  mutate(even = as.numeric(Topic %% 2 == 0)) %>% 
  group_by(even) %>% 
  mutate(new_variable = sum(Gamma))
#> # A tibble: 5 × 5
#> # Groups:   even [2]
#>   Observation Topic Gamma  even new_variable
#>   <chr>       <int> <dbl> <dbl>        <dbl>
#> 1 Apple           1   0.1     0          0.4
#> 2 Blueberry       2   0.1     1          0.3
#> 3 Cirtus          3   0.2     0          0.4
#> 4 Dates           4   0.2     1          0.3
#> 5 Eggplant        5   0.1     0          0.4

reprex package (v2.0.1)

创建于 2022-05-13

另一个可能的解决方案:

library(dplyr)

df %>% 
  group_by(grp = if_else(Topic %in% c(1, 3, 5), 1, 2)) %>% 
  mutate(new_variable = sum(Gamma)) %>% 
  ungroup %>% 
  select(-grp)

#> # A tibble: 5 × 4
#>   Observation Topic Gamma new_variable
#>   <chr>       <int> <dbl>        <dbl>
#> 1 Apple           1   0.1          0.4
#> 2 Blueberry       2   0.1          0.3
#> 3 Cirtus          3   0.2          0.4
#> 4 Dates           4   0.2          0.3
#> 5 Eggplant        5   0.1          0.4

更新 II(但也适用于第一次更新)

有了base R,我们可以先创建一个新的分组列,我们复制Topic列作为因子,然后我们可以根据你想要分组的行来改变水平来求和。然后,我们可以通过 Topic 和行组得到 Gamma 列的总和。然后,删除 grp 列。

df$grp <- factor(df$Topic)

levels(df$grp) <- list(
  "1" = 1,
  "2" = c(2,5,6),
  "3" = c(3,4)
)

df$new_variable <- ave(df$Gamma, df[,c(1,4)], FUN = sum)

df <- df[,-4]

输出

   Observation Topic Gamma new_variable
1        Apple     1   0.1          0.1
2        Apple     2   0.1          0.7
3        Apple     3   0.2          0.4
4        Apple     4   0.2          0.4
5        Apple     5   0.1          0.7
6        Apple     6   0.5          0.7
7    Blueberry     1   0.2          0.2
8    Blueberry     2   0.1          0.6
9    Blueberry     3   0.3          0.8
10   Blueberry     4   0.5          0.8
11   Blueberry     5   0.4          0.6
12   Blueberry     6   0.1          0.6

数据

df <- structure(list(Observation = c("Apple", "Apple", "Apple", "Apple", 
"Apple", "Apple", "Blueberry", "Blueberry", "Blueberry", "Blueberry", 
"Blueberry", "Blueberry"), Topic = c(1L, 2L, 3L, 4L, 5L, 6L, 
1L, 2L, 3L, 4L, 5L, 6L), Gamma = c(0.1, 0.1, 0.2, 0.2, 0.1, 0.5, 
0.2, 0.1, 0.3, 0.5, 0.4, 0.1)), class = "data.frame", row.names = c(NA, 
-12L))

第一个回答

有了基数 R,我们可以用 ave 得到每组的总和。在这里,我使用逻辑创建组,因为我们只有 2 个组。

df$new_variable <- ave(df$Gamma, row.names(df) %in% c(1, 3, 5), FUN=sum)

输出

  Observation Topic Gamma new_variable
1       Apple     1   0.1          0.4
2   Blueberry     2   0.1          0.3
3      Cirtus     3   0.2          0.4
4       Dates     4   0.2          0.3
5    Eggplant     5   0.1          0.4

或者我们可以获取每组行的总和并按索引分配给新列。

df$new_variable[c(1, 3, 5)] <- sum(df$Gamma[c(1, 3, 5)], na.rm = T)
df$new_variable[c(2, 4)] <- sum(df$Gamma[c(2, 4)], na.rm = T)