如何删除 R 数据框中与相同值对应的行对?
How to remove pairs of rows corresponding to same value in R dataframe?
对于唯一的一对ID
,如果对应的行都是0,我需要将它们移除。在这种情况下,删除第 5 行和第 6 行,但不删除第 7 行和第 8 行。
tmt.pair <- c("A","A","A","A","B","B","B","B")
tmt <- c("1000 C","4000 C","1000 C","4000 C","1000 C","4000 C","1000 C","4000 C")
year <- c("2021","2021","2021","2021","2021","2021","2020","2020")
month <- c("A","A","A","A","J","J","O","O")
level <- c("Low","Low","Up","Up","Low","Low","Low","Low")
site <- c(1,1,2,2,1,1,1,1)
val <- c(100,2,10,9,0,0,1,0)
df <- data.frame(tmt.pair, year,month, level,tmt,val)
df$ID <- cumsum(!duplicated(df[1:4]))
tmt.pair year month level tmt val ID
1 A 2021 A Low 1000 C 100 1
2 A 2021 A Low 4000 C 2 1
3 A 2021 A Up 1000 C 10 2
4 A 2021 A Up 1000 C 10 2
5 B 2021 J Low 1000 C 0 3
6 B 2021 J Low 4000 C 0 3
7 B 2020 O Low 1000 C 1 4
8 B 2020 O Low 4000 C 0 4
df[as.logical(with(df, ave(val, ID, FUN = \(x) !all(x == 0)))), ]
tmt.pair year month level tmt val ID
1 A 2021 A Low 1000 C 100 1
2 A 2021 A Low 4000 C 2 1
3 A 2021 A Up 1000 C 10 2
4 A 2021 A Up 4000 C 9 2
7 B 2020 O Low 1000 C 1 4
8 B 2020 O Low 4000 C 0 4
您可以使用以下 base
R 选项:
df[df$ID %in% df$ID[df$val!=0], ]
输出:
tmt.pair year month level tmt val ID
1 A 2021 A Low 1000 C 100 1
2 A 2021 A Low 4000 C 2 1
3 A 2021 A Up 1000 C 10 2
4 A 2021 A Up 4000 C 9 2
7 B 2020 O Low 1000 C 1 4
8 B 2020 O Low 4000 C 0 4
使用dplyr
,我们可以先group_by
ID
列,然后使用filter
检查all
val
是否是“ 0".
library(dplyr)
df %>% group_by(ID) %>% filter(!all(val == 0)) %>% ungroup()
# A tibble: 6 × 7
tmt.pair year month level tmt val ID
<chr> <chr> <chr> <chr> <chr> <dbl> <int>
1 A 2021 A Low 1000 C 100 1
2 A 2021 A Low 4000 C 2 1
3 A 2021 A Up 1000 C 10 2
4 A 2021 A Up 4000 C 9 2
5 B 2020 O Low 1000 C 1 4
6 B 2020 O Low 4000 C 0 4
对于唯一的一对ID
,如果对应的行都是0,我需要将它们移除。在这种情况下,删除第 5 行和第 6 行,但不删除第 7 行和第 8 行。
tmt.pair <- c("A","A","A","A","B","B","B","B")
tmt <- c("1000 C","4000 C","1000 C","4000 C","1000 C","4000 C","1000 C","4000 C")
year <- c("2021","2021","2021","2021","2021","2021","2020","2020")
month <- c("A","A","A","A","J","J","O","O")
level <- c("Low","Low","Up","Up","Low","Low","Low","Low")
site <- c(1,1,2,2,1,1,1,1)
val <- c(100,2,10,9,0,0,1,0)
df <- data.frame(tmt.pair, year,month, level,tmt,val)
df$ID <- cumsum(!duplicated(df[1:4]))
tmt.pair year month level tmt val ID
1 A 2021 A Low 1000 C 100 1
2 A 2021 A Low 4000 C 2 1
3 A 2021 A Up 1000 C 10 2
4 A 2021 A Up 1000 C 10 2
5 B 2021 J Low 1000 C 0 3
6 B 2021 J Low 4000 C 0 3
7 B 2020 O Low 1000 C 1 4
8 B 2020 O Low 4000 C 0 4
df[as.logical(with(df, ave(val, ID, FUN = \(x) !all(x == 0)))), ]
tmt.pair year month level tmt val ID
1 A 2021 A Low 1000 C 100 1
2 A 2021 A Low 4000 C 2 1
3 A 2021 A Up 1000 C 10 2
4 A 2021 A Up 4000 C 9 2
7 B 2020 O Low 1000 C 1 4
8 B 2020 O Low 4000 C 0 4
您可以使用以下 base
R 选项:
df[df$ID %in% df$ID[df$val!=0], ]
输出:
tmt.pair year month level tmt val ID
1 A 2021 A Low 1000 C 100 1
2 A 2021 A Low 4000 C 2 1
3 A 2021 A Up 1000 C 10 2
4 A 2021 A Up 4000 C 9 2
7 B 2020 O Low 1000 C 1 4
8 B 2020 O Low 4000 C 0 4
使用dplyr
,我们可以先group_by
ID
列,然后使用filter
检查all
val
是否是“ 0".
library(dplyr)
df %>% group_by(ID) %>% filter(!all(val == 0)) %>% ungroup()
# A tibble: 6 × 7
tmt.pair year month level tmt val ID
<chr> <chr> <chr> <chr> <chr> <dbl> <int>
1 A 2021 A Low 1000 C 100 1
2 A 2021 A Low 4000 C 2 1
3 A 2021 A Up 1000 C 10 2
4 A 2021 A Up 4000 C 9 2
5 B 2020 O Low 1000 C 1 4
6 B 2020 O Low 4000 C 0 4