R 中具有两列的唯一行

Question

我在 R 中有这样一个数据框：

id  year othercolumns
1   2017 ...
2   2017 ...
1   2018 ...
2   2018 ...
3   2018 ...
4   2018 ...
1   2019 ...
2   2019 ...
3   2019 ...
4   2019 ...
5   2019 ...

我需要 select id 的唯一值，但只保留它出现的第一年的记录。我需要的结果是这样的。

id year othercolumns
1  2017 ...
2  2017 ...
3  2018 ...
4  2018 ...
5  2019 ...

我的数据可以有任何开始年份，但结束年份始终是 2020。

Answer 1

使用dplyr,

df <- data.frame(
  id= c(1,2,1,2,3,4,1,2,3,4,5),
  year = c(2017,2017,2018,2018,2018,2018,2019,2019,2019,2019,2019)
)
require(dplyr)

df %>% 
  group_by(id) %>% 
  summarise(year = first(year))
#> # A tibble: 5 × 2
#>      id  year
#>   <dbl> <dbl>
#> 1     1  2017
#> 2     2  2017
#> 3     3  2018
#> 4     4  2018
#> 5     5  2019

^{由 reprex package (v2.0.1)}

于 2022-05-10 创建

R 中具有两列的唯一行

Unique rows with two columns in R

r

tidyverse