如何删除具有重复元素的行?

How to remove rows that have repeated elements?

我有一个看起来像这样的数据框(但适用于美国的每个县)

状态 neighbor_county neighbor_state
鲍德温县 克拉克县 不适用
鲍德温县 埃斯坎比亚县 FL
鲍德温县 莫比尔县 不适用
鲍德温县 门罗县 不适用
巴伯县 戴尔县 不适用
巴伯县 亨利县 不适用

我只对县附近的州感兴趣,所以我想删除重复的数据以获得此(第 1 步):

状态 neighbor_state
鲍德温县 不适用
鲍德温县 FL
巴伯县 不适用

然后像这样更改数据框的排序(第 2 步):

状态 neighbor_state_1 neighbor_state_2 neighbor_state_3
鲍德温县 FL 不适用 不适用
鲍德温县 不适用 不适用 不适用

在第 1 步中,我删除了“neighbor_county”列;但是,我没有设法删除每个不同县的“neighbor_state”列中的重复项。我试过使用 unique 函数,但我似乎无法让它工作,以至于它只能删除每个不同县的重复项。

对于第一步,您可以删除 neighbour_county 列并使用 unique():

df$neighbor_county <- NULL
unique(df)

returns

          county state neighbor_state
1 Baldwin_County    AL             NA
2 Baldwin_County    AL             FL
5 Barbour_County    AL             NA

使用 dplyr 的替代方法:

df %>% 
  select(-neighbor_county) %>% 
  distinct()

对于你的第二步我提个建议:

library(tidyr)
library(dplyr)

df %>% 
  group_by(county) %>% 
  select(-neighbor_county) %>% 
  mutate(n = row_number()) %>% 
  pivot_wider(names_from=n, names_prefix="neighbor_state_", values_from=neighbor_state) %>% 
  ungroup()

returns

# A tibble: 2 x 6
  county         state neighbor_state_1 neighbor_state_2 neighbor_state_3 neighbor_state_4
  <chr>          <chr> <chr>            <chr>            <chr>            <chr>           
1 Baldwin_County AL    'NA'             'FL'             'NA'             'NA'            
2 Barbour_County AL    'NA'             'NA'             NA               NA     

但我不确定这是否是您要查找的内容。

要删除双倍的 NA 值,您可以使用

df %>% 
  group_by(county) %>% 
  select(-neighbor_county) %>% 
  distinct() %>% 
  mutate(n = row_number()) %>% 
  pivot_wider(names_from=n, names_prefix="neighbor_state_", values_from=neighbor_state) %>% 
  ungroup()

数据

structure(list(county = c("Baldwin_County", "Baldwin_County", 
"Baldwin_County", "Baldwin_County", "Barbour_County", "Barbour_County"
), state = c("AL", "AL", "AL", "AL", "AL", "AL"), neighbor_county = c("Clarke_County", 
"Escambia_County", "Mobile_County", "Monroe_County", "Dale_County", 
"Henry_County"), neighbor_state = c("'NA'", "'FL'", "'NA'", "'NA'", 
"'NA'", "'NA'")), problems = structure(list(row = 6L, col = "neighbor_state", 
    expected = "", actual = "embedded null", file = "literal data"), row.names = c(NA, 
-1L), class = c("tbl_df", "tbl", "data.frame")), class = "data.frame", row.names = c(NA, 
-6L), spec = structure(list(cols = list(county = structure(list(), class = c("collector_character", 
"collector")), state = structure(list(), class = c("collector_character", 
"collector")), neighbor_county = structure(list(), class = c("collector_character", 
"collector")), neighbor_state = structure(list(), class = c("collector_character", 
"collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1L), class = "col_spec"))