如何在条件下合并数据框的行
How to merge row of a dataframe under conditions
我需要你的帮助来根据条件合并数据框的一些行
请在下面找到我的数据框示例:
ID=c("A1", "A2", "A3", "A4", "A5", "A6", "A1", "A3", "A6", "A1", "A2", "A5", "A6")
G1=c(1,0,0,0,0,0,2,0,0,0,0,0,0)
G2=c(0,1,0,0,0,0,0,0,0,0,0,0,0)
G3=c(0,0,1,0,0,0,0,0,0,0,0,0,0)
G4=c(0,0,0,1,0,0,0,0,0,0,0,0,0)
G5=c(0,0,0,0,1,0,0,0,0,0,0,0,0)
G6=c(0,0,0,0,0,0,0,2,0,0,0,3,0)
G7=c(0,0,0,0,0,0,0,0,0,3,0,0,0)
G8=c(0,0,0,0,0,0,0,0,0,3,3,0,0)
G9=c(0,0,0,0,0,1,0,0,1,0,0,3,2)
data
ID G1 G2 G3 G4 G5 G6 G7 G8 G9
1 A1 1 0 0 0 0 0 0 0 0
2 A2 0 1 0 0 0 0 0 0 0
3 A3 0 0 1 0 0 0 0 0 0
4 A4 0 0 0 1 0 0 0 0 0
5 A5 0 0 0 0 1 0 0 0 0
6 A6 0 0 0 0 0 0 0 0 1
7 A1 2 0 0 0 0 0 0 0 0
8 A3 0 0 0 0 0 2 0 0 0
9 A6 0 0 0 0 0 0 0 0 1
10 A1 0 0 0 0 0 0 3 3 0
11 A2 0 0 0 0 0 0 0 3 0
12 A5 0 0 0 0 0 3 0 0 3
13 A6 0 0 0 0 0 0 0 0 2
我想在这种情况下合并数据框中具有相同 ID 的原始数据:
如果同一个 ID 在同一列中有多个值,我想保留合并行中的最小值(A1 和 A6 就是这种情况)。
下面是我的数据框遵循这些规则的期望输出
ID G1 G2 G3 G4 G5 G6 G7 G8 G9
1 A1 1 0 0 0 0 0 3 3 0
2 A2 0 1 0 0 0 0 0 3 0
3 A3 0 0 1 0 0 2 0 0 0
4 A4 0 0 0 1 0 0 0 0 0
5 A5 0 0 0 0 1 3 0 0 3
6 A6 0 0 0 0 0 0 0 0 1
也许我们可以这样使用aggregate
> aggregate(. ~ ID, df, function(x) ifelse(sum(x > 0), min(x[x > 0]), 0))
ID G1 G2 G3 G4 G5 G6 G7 G8 G9
1 A1 1 0 0 0 0 0 3 3 0
2 A2 0 1 0 0 0 0 0 3 0
3 A3 0 0 1 0 0 2 0 0 0
4 A4 0 0 0 1 0 0 0 0 0
5 A5 0 0 0 0 1 3 0 0 3
6 A6 0 0 0 0 0 0 0 0 1
带有 min_
的选项(来自 hablar
)
library(hablar)
library(dplyr)
library(tidyr)
data %>%
group_by(ID) %>%
summarise(across(everything(), ~ replace_na(min_(.[. > 0]), 0)))
-输出
# A tibble: 6 x 10
ID G1 G2 G3 G4 G5 G6 G7 G8 G9
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A1 1 0 0 0 0 0 3 3 0
2 A2 0 1 0 0 0 0 0 3 0
3 A3 0 0 1 0 0 2 0 0 0
4 A4 0 0 0 1 0 0 0 0 0
5 A5 0 0 0 0 1 3 0 0 3
6 A6 0 0 0 0 0 0 0 0 1
我需要你的帮助来根据条件合并数据框的一些行
请在下面找到我的数据框示例:
ID=c("A1", "A2", "A3", "A4", "A5", "A6", "A1", "A3", "A6", "A1", "A2", "A5", "A6")
G1=c(1,0,0,0,0,0,2,0,0,0,0,0,0)
G2=c(0,1,0,0,0,0,0,0,0,0,0,0,0)
G3=c(0,0,1,0,0,0,0,0,0,0,0,0,0)
G4=c(0,0,0,1,0,0,0,0,0,0,0,0,0)
G5=c(0,0,0,0,1,0,0,0,0,0,0,0,0)
G6=c(0,0,0,0,0,0,0,2,0,0,0,3,0)
G7=c(0,0,0,0,0,0,0,0,0,3,0,0,0)
G8=c(0,0,0,0,0,0,0,0,0,3,3,0,0)
G9=c(0,0,0,0,0,1,0,0,1,0,0,3,2)
data
ID G1 G2 G3 G4 G5 G6 G7 G8 G9
1 A1 1 0 0 0 0 0 0 0 0
2 A2 0 1 0 0 0 0 0 0 0
3 A3 0 0 1 0 0 0 0 0 0
4 A4 0 0 0 1 0 0 0 0 0
5 A5 0 0 0 0 1 0 0 0 0
6 A6 0 0 0 0 0 0 0 0 1
7 A1 2 0 0 0 0 0 0 0 0
8 A3 0 0 0 0 0 2 0 0 0
9 A6 0 0 0 0 0 0 0 0 1
10 A1 0 0 0 0 0 0 3 3 0
11 A2 0 0 0 0 0 0 0 3 0
12 A5 0 0 0 0 0 3 0 0 3
13 A6 0 0 0 0 0 0 0 0 2
我想在这种情况下合并数据框中具有相同 ID 的原始数据:
如果同一个 ID 在同一列中有多个值,我想保留合并行中的最小值(A1 和 A6 就是这种情况)。
下面是我的数据框遵循这些规则的期望输出
ID G1 G2 G3 G4 G5 G6 G7 G8 G9
1 A1 1 0 0 0 0 0 3 3 0
2 A2 0 1 0 0 0 0 0 3 0
3 A3 0 0 1 0 0 2 0 0 0
4 A4 0 0 0 1 0 0 0 0 0
5 A5 0 0 0 0 1 3 0 0 3
6 A6 0 0 0 0 0 0 0 0 1
也许我们可以这样使用aggregate
> aggregate(. ~ ID, df, function(x) ifelse(sum(x > 0), min(x[x > 0]), 0))
ID G1 G2 G3 G4 G5 G6 G7 G8 G9
1 A1 1 0 0 0 0 0 3 3 0
2 A2 0 1 0 0 0 0 0 3 0
3 A3 0 0 1 0 0 2 0 0 0
4 A4 0 0 0 1 0 0 0 0 0
5 A5 0 0 0 0 1 3 0 0 3
6 A6 0 0 0 0 0 0 0 0 1
带有 min_
的选项(来自 hablar
)
library(hablar)
library(dplyr)
library(tidyr)
data %>%
group_by(ID) %>%
summarise(across(everything(), ~ replace_na(min_(.[. > 0]), 0)))
-输出
# A tibble: 6 x 10
ID G1 G2 G3 G4 G5 G6 G7 G8 G9
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A1 1 0 0 0 0 0 3 3 0
2 A2 0 1 0 0 0 0 0 3 0
3 A3 0 0 1 0 0 2 0 0 0
4 A4 0 0 0 1 0 0 0 0 0
5 A5 0 0 0 0 1 3 0 0 3
6 A6 0 0 0 0 0 0 0 0 1