如何将多行中的值求和到 R 中的新列?
How to sum values in multiple rows to a new column in R?
我的数据框:
structure(list(Observation = c("Apple", "Blueberry", "Cirtus",
"Dates", "Eggplant"), Topic = 1:5, Gamma = c(0.1, 0.1, 0.2, 0.2,
0.1)), class = "data.frame", row.names = c(NA, -5L))
Observation Topic Gamma
1 Apple 1 0.1
2 Blueberry 2 0.1
3 Cirtus 3 0.2
4 Dates 4 0.2
5 Eggplant 5 0.1
如何告诉 R 将 1、3 和 5 以及 2 和 4 的值相加,然后将其保存在新列中?例如:
Observation
Topic
Gamma
new variable
Apple
1
.10
.40
Blueberry
2
.10
.30
Cirtus
3
.20
.40
Dates
4
.20
.30
Eggplant
5
.10
.40
基本上,我希望每个观察结果都有一个新值,用于总结主题 1、3 和 5 以及主题 2 和 4 的伽玛分数。
更新:澄清:
我不想添加偶数主题编号或奇数主题编号。有时它会是两者的混合体。以这个新的 table 为例:
Observation
Topic
Gamma
new variable
Apple
1
.10
.10
Blueberry
2
.10
.70
Cirtus
3
.20
.40
Dates
4
.20
.40
Eggplant
5
.10
.70
Fruits
6
.50
.70
在这个例子中,我保留了主题 1,添加了主题 2、5 和 6,并添加了主题 3 和 4。
更新:澄清:
Observation
Topic
Gamma
new variable
Apple
1
.10
.10
Apple
2
.10
.70
Apple
3
.20
.40
Apple
4
.20
.40
Apple
5
.10
.70
Apple
6
.50
.70
Blueberry
1
.20
.20
Blueberry
2
.10
.60
Blueberry
3
.30
.80
Blueberry
4
.50
.80
Blueberry
5
.40
.60
Blueberry
6
.10
.60
在这个例子中,每个水果(观察)对每个主题都有自己的一组值,我对每个水果的上面列出的相同主题(2、5、6、3 和 4)求和。
根据新请求更新 II:
library(dplyr)
df %>%
group_by(Observation, grp = case_when(Topic %in% 1 ~ 1,
Topic %in% c(2,5,6) ~ 2,
Topic %in% c(3,4) ~ 3)) %>%
mutate(new_variable = sum(Gamma)) %>%
ungroup %>%
select(-grp)
Observation Topic Gamma new_variable
<chr> <int> <dbl> <dbl>
1 Apple 1 0.1 0.1
2 Apple 2 0.1 0.7
3 Apple 3 0.2 0.4
4 Apple 4 0.2 0.4
5 Apple 5 0.1 0.7
6 Apple 6 0.5 0.7
7 Blueberry 1 0.2 0.2
8 Blueberry 2 0.1 0.6
9 Blueberry 3 0.3 0.8
10 Blueberry 4 0.5 0.8
11 Blueberry 5 0.4 0.6
12 Blueberry 6 0.1 0.6
更新: 根据 OP 的新请求。该解决方案的灵感完全来自 PaulS 解决方案(归功于他):
library(dplyr)
df %>%
group_by(grp = case_when(Topic %in% 1 ~ 1,
Topic %in% c(2,5,6) ~ 2,
Topic %in% c(3,4) ~ 3)) %>%
mutate(new_variable = sum(Gamma)) %>%
ungroup %>%
select(-grp)
Observation Topic Gamma new_variable
<chr> <int> <dbl> <dbl>
1 Apple 1 0.1 0.1
2 Blueberry 2 0.1 0.7
3 Cirtus 3 0.2 0.4
4 Dates 4 0.2 0.4
5 Eggplant 5 0.1 0.7
6 Fruits 6 0.5 0.7
第一个回答:
在 ifelse 语句中识别奇数行和偶数行后,我们可以对 Gamma
求和:
在这种情况下 row_number
可以替换为 Topic
library(dplyr)
df %>%
mutate(new_variable = ifelse(row_number() %% 2 == 1,
sum(Gamma[row_number() %% 2 == 1]), # odd 1,3,5
sum(Gamma[row_number() %% 2 == 0])) # even 2,4
)
Observation Topic Gamma new_variable
1 Apple 1 0.1 0.4
2 Blueberry 2 0.1 0.3
3 Cirtus 3 0.2 0.4
4 Dates 4 0.2 0.3
5 Eggplant 5 0.1 0.4
数据:
structure(list(Observation = c("Apple", "Blueberry", "Cirtus",
"Dates", "Eggplant"), Topic = 1:5, Gamma = c(0.1, 0.1, 0.2, 0.2,
0.1)), class = "data.frame", row.names = c(NA, -5L))
Microbenchmark:AndrewGB 的基础 R 最快
这应该可以做到。
dat <- structure(list(Observation = c("Apple", "Blueberry", "Cirtus",
"Dates", "Eggplant"),
Topic = 1:5, Gamma = c(0.1, 0.1, 0.2, 0.2, 0.1)),
row.names = c(NA, 5L), class = "data.frame")
library(tidyverse)
dat %>%
mutate(even = as.numeric(Topic %% 2 == 0)) %>%
group_by(even) %>%
mutate(new_variable = sum(Gamma))
#> # A tibble: 5 × 5
#> # Groups: even [2]
#> Observation Topic Gamma even new_variable
#> <chr> <int> <dbl> <dbl> <dbl>
#> 1 Apple 1 0.1 0 0.4
#> 2 Blueberry 2 0.1 1 0.3
#> 3 Cirtus 3 0.2 0 0.4
#> 4 Dates 4 0.2 1 0.3
#> 5 Eggplant 5 0.1 0 0.4
由 reprex package (v2.0.1)
创建于 2022-05-13
另一个可能的解决方案:
library(dplyr)
df %>%
group_by(grp = if_else(Topic %in% c(1, 3, 5), 1, 2)) %>%
mutate(new_variable = sum(Gamma)) %>%
ungroup %>%
select(-grp)
#> # A tibble: 5 × 4
#> Observation Topic Gamma new_variable
#> <chr> <int> <dbl> <dbl>
#> 1 Apple 1 0.1 0.4
#> 2 Blueberry 2 0.1 0.3
#> 3 Cirtus 3 0.2 0.4
#> 4 Dates 4 0.2 0.3
#> 5 Eggplant 5 0.1 0.4
更新 II(但也适用于第一次更新)
有了base R,我们可以先创建一个新的分组列,我们复制Topic
列作为因子,然后我们可以根据你想要分组的行来改变水平来求和。然后,我们可以通过 Topic
和行组得到 Gamma
列的总和。然后,删除 grp
列。
df$grp <- factor(df$Topic)
levels(df$grp) <- list(
"1" = 1,
"2" = c(2,5,6),
"3" = c(3,4)
)
df$new_variable <- ave(df$Gamma, df[,c(1,4)], FUN = sum)
df <- df[,-4]
输出
Observation Topic Gamma new_variable
1 Apple 1 0.1 0.1
2 Apple 2 0.1 0.7
3 Apple 3 0.2 0.4
4 Apple 4 0.2 0.4
5 Apple 5 0.1 0.7
6 Apple 6 0.5 0.7
7 Blueberry 1 0.2 0.2
8 Blueberry 2 0.1 0.6
9 Blueberry 3 0.3 0.8
10 Blueberry 4 0.5 0.8
11 Blueberry 5 0.4 0.6
12 Blueberry 6 0.1 0.6
数据
df <- structure(list(Observation = c("Apple", "Apple", "Apple", "Apple",
"Apple", "Apple", "Blueberry", "Blueberry", "Blueberry", "Blueberry",
"Blueberry", "Blueberry"), Topic = c(1L, 2L, 3L, 4L, 5L, 6L,
1L, 2L, 3L, 4L, 5L, 6L), Gamma = c(0.1, 0.1, 0.2, 0.2, 0.1, 0.5,
0.2, 0.1, 0.3, 0.5, 0.4, 0.1)), class = "data.frame", row.names = c(NA,
-12L))
第一个回答
有了基数 R,我们可以用 ave
得到每组的总和。在这里,我使用逻辑创建组,因为我们只有 2 个组。
df$new_variable <- ave(df$Gamma, row.names(df) %in% c(1, 3, 5), FUN=sum)
输出
Observation Topic Gamma new_variable
1 Apple 1 0.1 0.4
2 Blueberry 2 0.1 0.3
3 Cirtus 3 0.2 0.4
4 Dates 4 0.2 0.3
5 Eggplant 5 0.1 0.4
或者我们可以获取每组行的总和并按索引分配给新列。
df$new_variable[c(1, 3, 5)] <- sum(df$Gamma[c(1, 3, 5)], na.rm = T)
df$new_variable[c(2, 4)] <- sum(df$Gamma[c(2, 4)], na.rm = T)
我的数据框:
structure(list(Observation = c("Apple", "Blueberry", "Cirtus",
"Dates", "Eggplant"), Topic = 1:5, Gamma = c(0.1, 0.1, 0.2, 0.2,
0.1)), class = "data.frame", row.names = c(NA, -5L))
Observation Topic Gamma
1 Apple 1 0.1
2 Blueberry 2 0.1
3 Cirtus 3 0.2
4 Dates 4 0.2
5 Eggplant 5 0.1
如何告诉 R 将 1、3 和 5 以及 2 和 4 的值相加,然后将其保存在新列中?例如:
Observation | Topic | Gamma | new variable |
---|---|---|---|
Apple | 1 | .10 | .40 |
Blueberry | 2 | .10 | .30 |
Cirtus | 3 | .20 | .40 |
Dates | 4 | .20 | .30 |
Eggplant | 5 | .10 | .40 |
基本上,我希望每个观察结果都有一个新值,用于总结主题 1、3 和 5 以及主题 2 和 4 的伽玛分数。
更新:澄清: 我不想添加偶数主题编号或奇数主题编号。有时它会是两者的混合体。以这个新的 table 为例:
Observation | Topic | Gamma | new variable |
---|---|---|---|
Apple | 1 | .10 | .10 |
Blueberry | 2 | .10 | .70 |
Cirtus | 3 | .20 | .40 |
Dates | 4 | .20 | .40 |
Eggplant | 5 | .10 | .70 |
Fruits | 6 | .50 | .70 |
在这个例子中,我保留了主题 1,添加了主题 2、5 和 6,并添加了主题 3 和 4。
更新:澄清:
Observation | Topic | Gamma | new variable |
---|---|---|---|
Apple | 1 | .10 | .10 |
Apple | 2 | .10 | .70 |
Apple | 3 | .20 | .40 |
Apple | 4 | .20 | .40 |
Apple | 5 | .10 | .70 |
Apple | 6 | .50 | .70 |
Blueberry | 1 | .20 | .20 |
Blueberry | 2 | .10 | .60 |
Blueberry | 3 | .30 | .80 |
Blueberry | 4 | .50 | .80 |
Blueberry | 5 | .40 | .60 |
Blueberry | 6 | .10 | .60 |
在这个例子中,每个水果(观察)对每个主题都有自己的一组值,我对每个水果的上面列出的相同主题(2、5、6、3 和 4)求和。
根据新请求更新 II:
library(dplyr)
df %>%
group_by(Observation, grp = case_when(Topic %in% 1 ~ 1,
Topic %in% c(2,5,6) ~ 2,
Topic %in% c(3,4) ~ 3)) %>%
mutate(new_variable = sum(Gamma)) %>%
ungroup %>%
select(-grp)
Observation Topic Gamma new_variable
<chr> <int> <dbl> <dbl>
1 Apple 1 0.1 0.1
2 Apple 2 0.1 0.7
3 Apple 3 0.2 0.4
4 Apple 4 0.2 0.4
5 Apple 5 0.1 0.7
6 Apple 6 0.5 0.7
7 Blueberry 1 0.2 0.2
8 Blueberry 2 0.1 0.6
9 Blueberry 3 0.3 0.8
10 Blueberry 4 0.5 0.8
11 Blueberry 5 0.4 0.6
12 Blueberry 6 0.1 0.6
更新: 根据 OP 的新请求。该解决方案的灵感完全来自 PaulS 解决方案(归功于他):
library(dplyr)
df %>%
group_by(grp = case_when(Topic %in% 1 ~ 1,
Topic %in% c(2,5,6) ~ 2,
Topic %in% c(3,4) ~ 3)) %>%
mutate(new_variable = sum(Gamma)) %>%
ungroup %>%
select(-grp)
Observation Topic Gamma new_variable
<chr> <int> <dbl> <dbl>
1 Apple 1 0.1 0.1
2 Blueberry 2 0.1 0.7
3 Cirtus 3 0.2 0.4
4 Dates 4 0.2 0.4
5 Eggplant 5 0.1 0.7
6 Fruits 6 0.5 0.7
第一个回答:
在 ifelse 语句中识别奇数行和偶数行后,我们可以对 Gamma
求和:
在这种情况下 row_number
可以替换为 Topic
library(dplyr)
df %>%
mutate(new_variable = ifelse(row_number() %% 2 == 1,
sum(Gamma[row_number() %% 2 == 1]), # odd 1,3,5
sum(Gamma[row_number() %% 2 == 0])) # even 2,4
)
Observation Topic Gamma new_variable
1 Apple 1 0.1 0.4
2 Blueberry 2 0.1 0.3
3 Cirtus 3 0.2 0.4
4 Dates 4 0.2 0.3
5 Eggplant 5 0.1 0.4
数据:
structure(list(Observation = c("Apple", "Blueberry", "Cirtus",
"Dates", "Eggplant"), Topic = 1:5, Gamma = c(0.1, 0.1, 0.2, 0.2,
0.1)), class = "data.frame", row.names = c(NA, -5L))
Microbenchmark:AndrewGB 的基础 R 最快
这应该可以做到。
dat <- structure(list(Observation = c("Apple", "Blueberry", "Cirtus",
"Dates", "Eggplant"),
Topic = 1:5, Gamma = c(0.1, 0.1, 0.2, 0.2, 0.1)),
row.names = c(NA, 5L), class = "data.frame")
library(tidyverse)
dat %>%
mutate(even = as.numeric(Topic %% 2 == 0)) %>%
group_by(even) %>%
mutate(new_variable = sum(Gamma))
#> # A tibble: 5 × 5
#> # Groups: even [2]
#> Observation Topic Gamma even new_variable
#> <chr> <int> <dbl> <dbl> <dbl>
#> 1 Apple 1 0.1 0 0.4
#> 2 Blueberry 2 0.1 1 0.3
#> 3 Cirtus 3 0.2 0 0.4
#> 4 Dates 4 0.2 1 0.3
#> 5 Eggplant 5 0.1 0 0.4
由 reprex package (v2.0.1)
创建于 2022-05-13另一个可能的解决方案:
library(dplyr)
df %>%
group_by(grp = if_else(Topic %in% c(1, 3, 5), 1, 2)) %>%
mutate(new_variable = sum(Gamma)) %>%
ungroup %>%
select(-grp)
#> # A tibble: 5 × 4
#> Observation Topic Gamma new_variable
#> <chr> <int> <dbl> <dbl>
#> 1 Apple 1 0.1 0.4
#> 2 Blueberry 2 0.1 0.3
#> 3 Cirtus 3 0.2 0.4
#> 4 Dates 4 0.2 0.3
#> 5 Eggplant 5 0.1 0.4
更新 II(但也适用于第一次更新)
有了base R,我们可以先创建一个新的分组列,我们复制Topic
列作为因子,然后我们可以根据你想要分组的行来改变水平来求和。然后,我们可以通过 Topic
和行组得到 Gamma
列的总和。然后,删除 grp
列。
df$grp <- factor(df$Topic)
levels(df$grp) <- list(
"1" = 1,
"2" = c(2,5,6),
"3" = c(3,4)
)
df$new_variable <- ave(df$Gamma, df[,c(1,4)], FUN = sum)
df <- df[,-4]
输出
Observation Topic Gamma new_variable
1 Apple 1 0.1 0.1
2 Apple 2 0.1 0.7
3 Apple 3 0.2 0.4
4 Apple 4 0.2 0.4
5 Apple 5 0.1 0.7
6 Apple 6 0.5 0.7
7 Blueberry 1 0.2 0.2
8 Blueberry 2 0.1 0.6
9 Blueberry 3 0.3 0.8
10 Blueberry 4 0.5 0.8
11 Blueberry 5 0.4 0.6
12 Blueberry 6 0.1 0.6
数据
df <- structure(list(Observation = c("Apple", "Apple", "Apple", "Apple",
"Apple", "Apple", "Blueberry", "Blueberry", "Blueberry", "Blueberry",
"Blueberry", "Blueberry"), Topic = c(1L, 2L, 3L, 4L, 5L, 6L,
1L, 2L, 3L, 4L, 5L, 6L), Gamma = c(0.1, 0.1, 0.2, 0.2, 0.1, 0.5,
0.2, 0.1, 0.3, 0.5, 0.4, 0.1)), class = "data.frame", row.names = c(NA,
-12L))
第一个回答
有了基数 R,我们可以用 ave
得到每组的总和。在这里,我使用逻辑创建组,因为我们只有 2 个组。
df$new_variable <- ave(df$Gamma, row.names(df) %in% c(1, 3, 5), FUN=sum)
输出
Observation Topic Gamma new_variable
1 Apple 1 0.1 0.4
2 Blueberry 2 0.1 0.3
3 Cirtus 3 0.2 0.4
4 Dates 4 0.2 0.3
5 Eggplant 5 0.1 0.4
或者我们可以获取每组行的总和并按索引分配给新列。
df$new_variable[c(1, 3, 5)] <- sum(df$Gamma[c(1, 3, 5)], na.rm = T)
df$new_variable[c(2, 4)] <- sum(df$Gamma[c(2, 4)], na.rm = T)