如何将多行中的值求和到 R 中的新列？

Question

我的数据框：

structure(list(Observation = c("Apple", "Blueberry", "Cirtus", 
"Dates", "Eggplant"), Topic = 1:5, Gamma = c(0.1, 0.1, 0.2, 0.2, 
0.1)), class = "data.frame", row.names = c(NA, -5L))

  Observation Topic Gamma
1       Apple     1   0.1
2   Blueberry     2   0.1
3      Cirtus     3   0.2
4       Dates     4   0.2
5    Eggplant     5   0.1

如何告诉 R 将 1、3 和 5 以及 2 和 4 的值相加，然后将其保存在新列中？例如：

Observation	Topic	Gamma	new variable
Apple	1	.10	.40
Blueberry	2	.10	.30
Cirtus	3	.20	.40
Dates	4	.20	.30
Eggplant	5	.10	.40

基本上，我希望每个观察结果都有一个新值，用于总结主题 1、3 和 5 以及主题 2 和 4 的伽玛分数。

更新：澄清： 我不想添加偶数主题编号或奇数主题编号。有时它会是两者的混合体。以这个新的 table 为例：

Observation	Topic	Gamma	new variable
Apple	1	.10	.10
Blueberry	2	.10	.70
Cirtus	3	.20	.40
Dates	4	.20	.40
Eggplant	5	.10	.70
Fruits	6	.50	.70

在这个例子中，我保留了主题 1，添加了主题 2、5 和 6，并添加了主题 3 和 4。

更新：澄清：

Observation	Topic	Gamma	new variable
Apple	1	.10	.10
Apple	2	.10	.70
Apple	3	.20	.40
Apple	4	.20	.40
Apple	5	.10	.70
Apple	6	.50	.70
Blueberry	1	.20	.20
Blueberry	2	.10	.60
Blueberry	3	.30	.80
Blueberry	4	.50	.80
Blueberry	5	.40	.60
Blueberry	6	.10	.60

在这个例子中，每个水果（观察）对每个主题都有自己的一组值，我对每个水果的上面列出的相同主题（2、5、6、3 和 4）求和。

Answer 1

根据新请求更新 II：

library(dplyr)

df %>% 
  group_by(Observation, grp = case_when(Topic %in% 1 ~ 1,
                           Topic %in% c(2,5,6) ~ 2,
                           Topic %in% c(3,4) ~ 3)) %>% 
  mutate(new_variable = sum(Gamma)) %>% 
  ungroup %>% 
  select(-grp)

  Observation Topic Gamma new_variable
   <chr>       <int> <dbl>        <dbl>
 1 Apple           1   0.1          0.1
 2 Apple           2   0.1          0.7
 3 Apple           3   0.2          0.4
 4 Apple           4   0.2          0.4
 5 Apple           5   0.1          0.7
 6 Apple           6   0.5          0.7
 7 Blueberry       1   0.2          0.2
 8 Blueberry       2   0.1          0.6
 9 Blueberry       3   0.3          0.8
10 Blueberry       4   0.5          0.8
11 Blueberry       5   0.4          0.6
12 Blueberry       6   0.1          0.6

更新： 根据 OP 的新请求。该解决方案的灵感完全来自 PaulS 解决方案（归功于他）：

library(dplyr)

df %>% 
  group_by(grp = case_when(Topic %in% 1 ~ 1,
                           Topic %in% c(2,5,6) ~ 2,
                           Topic %in% c(3,4) ~ 3)) %>% 
  mutate(new_variable = sum(Gamma)) %>% 
  ungroup %>% 
  select(-grp)

  Observation Topic Gamma new_variable
  <chr>       <int> <dbl>        <dbl>
1 Apple           1   0.1          0.1
2 Blueberry       2   0.1          0.7
3 Cirtus          3   0.2          0.4
4 Dates           4   0.2          0.4
5 Eggplant        5   0.1          0.7
6 Fruits          6   0.5          0.7

第一个回答： 在 ifelse 语句中识别奇数行和偶数行后，我们可以对 Gamma 求和：在这种情况下 row_number 可以替换为 Topic

library(dplyr)

df %>% 
  mutate(new_variable = ifelse(row_number() %% 2 == 1, 
                               sum(Gamma[row_number() %% 2 == 1]), # odd 1,3,5
                               sum(Gamma[row_number() %% 2 == 0])) # even 2,4
         )

  Observation Topic Gamma new_variable
1       Apple     1   0.1          0.4
2   Blueberry     2   0.1          0.3
3      Cirtus     3   0.2          0.4
4       Dates     4   0.2          0.3
5    Eggplant     5   0.1          0.4

数据：

structure(list(Observation = c("Apple", "Blueberry", "Cirtus", 
"Dates", "Eggplant"), Topic = 1:5, Gamma = c(0.1, 0.1, 0.2, 0.2, 
0.1)), class = "data.frame", row.names = c(NA, -5L))

Microbenchmark：AndrewGB 的基础 R 最快

Answer 2

这应该可以做到。

dat <- structure(list(Observation = c("Apple", "Blueberry", "Cirtus", 
                                 "Dates", "Eggplant"), 
                 Topic = 1:5, Gamma = c(0.1, 0.1, 0.2, 0.2, 0.1)), 
            row.names = c(NA, 5L), class = "data.frame")
library(tidyverse)
dat %>% 
  mutate(even = as.numeric(Topic %% 2 == 0)) %>% 
  group_by(even) %>% 
  mutate(new_variable = sum(Gamma))
#> # A tibble: 5 × 5
#> # Groups:   even [2]
#>   Observation Topic Gamma  even new_variable
#>   <chr>       <int> <dbl> <dbl>        <dbl>
#> 1 Apple           1   0.1     0          0.4
#> 2 Blueberry       2   0.1     1          0.3
#> 3 Cirtus          3   0.2     0          0.4
#> 4 Dates           4   0.2     1          0.3
#> 5 Eggplant        5   0.1     0          0.4

^{由 reprex package (v2.0.1)}

创建于 2022-05-13

Answer 3

另一个可能的解决方案：

library(dplyr)

df %>% 
  group_by(grp = if_else(Topic %in% c(1, 3, 5), 1, 2)) %>% 
  mutate(new_variable = sum(Gamma)) %>% 
  ungroup %>% 
  select(-grp)

#> # A tibble: 5 × 4
#>   Observation Topic Gamma new_variable
#>   <chr>       <int> <dbl>        <dbl>
#> 1 Apple           1   0.1          0.4
#> 2 Blueberry       2   0.1          0.3
#> 3 Cirtus          3   0.2          0.4
#> 4 Dates           4   0.2          0.3
#> 5 Eggplant        5   0.1          0.4

Answer 4

更新 II（但也适用于第一次更新）

有了base R，我们可以先创建一个新的分组列，我们复制Topic列作为因子，然后我们可以根据你想要分组的行来改变水平来求和。然后，我们可以通过 Topic 和行组得到 Gamma 列的总和。然后，删除 grp 列。

df$grp <- factor(df$Topic)

levels(df$grp) <- list(
  "1" = 1,
  "2" = c(2,5,6),
  "3" = c(3,4)
)

df$new_variable <- ave(df$Gamma, df[,c(1,4)], FUN = sum)

df <- df[,-4]

输出

   Observation Topic Gamma new_variable
1        Apple     1   0.1          0.1
2        Apple     2   0.1          0.7
3        Apple     3   0.2          0.4
4        Apple     4   0.2          0.4
5        Apple     5   0.1          0.7
6        Apple     6   0.5          0.7
7    Blueberry     1   0.2          0.2
8    Blueberry     2   0.1          0.6
9    Blueberry     3   0.3          0.8
10   Blueberry     4   0.5          0.8
11   Blueberry     5   0.4          0.6
12   Blueberry     6   0.1          0.6

数据

df <- structure(list(Observation = c("Apple", "Apple", "Apple", "Apple", 
"Apple", "Apple", "Blueberry", "Blueberry", "Blueberry", "Blueberry", 
"Blueberry", "Blueberry"), Topic = c(1L, 2L, 3L, 4L, 5L, 6L, 
1L, 2L, 3L, 4L, 5L, 6L), Gamma = c(0.1, 0.1, 0.2, 0.2, 0.1, 0.5, 
0.2, 0.1, 0.3, 0.5, 0.4, 0.1)), class = "data.frame", row.names = c(NA, 
-12L))

第一个回答

有了基数 R，我们可以用 ave 得到每组的总和。在这里，我使用逻辑创建组，因为我们只有 2 个组。

df$new_variable <- ave(df$Gamma, row.names(df) %in% c(1, 3, 5), FUN=sum)

输出

  Observation Topic Gamma new_variable
1       Apple     1   0.1          0.4
2   Blueberry     2   0.1          0.3
3      Cirtus     3   0.2          0.4
4       Dates     4   0.2          0.3
5    Eggplant     5   0.1          0.4

或者我们可以获取每组行的总和并按索引分配给新列。

df$new_variable[c(1, 3, 5)] <- sum(df$Gamma[c(1, 3, 5)], na.rm = T)
df$new_variable[c(2, 4)] <- sum(df$Gamma[c(2, 4)], na.rm = T)

如何将多行中的值求和到 R 中的新列？

How to sum values in multiple rows to a new column in R?

r

calculated-columns