要按名称分组并按外观排名并添加计数,同时消除每个州(降序)中前 2 名中不相关的名称?

To group by name and rank by appearance and add count while eliminating names not associated in top 2 within each state(descending)?

例如我有一个看起来像这样的数据集

    name |  state
   Smith      NY
 Anthony      CA
   James      MA
   Henry      CA
 Andrews      NY
   Helen      CA
   Smith      NY
   Smith      NY
 Anthony      CA
 Andrews      NY
 Richard      MA
 Richard      MA
 Richard      MA
 Anthony      CA
  Smith       MA
 Jeffries     CA
 Conrad       NY
  Hanes       NY
  James       MA
  Conrad      NY
  Conrad      NY
  Helen       CA

最后我想要这样的东西。请注意,州是按字母顺序排列的。请注意,出现次数最多的名称显示在顶部,出现次多的名称显示在其后。我只 select 每个分组(状态)中的前两个,然后我创建这些列以引用它们的排名并根据行外观进行计数。

  name|   state| Rank | Count 
Anthony     CA     1        3
Anthony     CA     1        3
Anthony     CA     1        3
 Helen      CA     2        2
 Helen      CA     2        2
Richard     MA     1        3
Richard     MA     1        3
Richard     MA     1        3
  James     MA     2        2
  James     MA     2        2
Smith       NY     1        3
Smith       NY     1        3
Smith       NY     1        3
Conrad      NY     1        3
Conrad      NY     1        3
Conrad      NY     1        3

也许这有帮助

library(dplyr)
df1 %>%
   add_count(name, state) %>% 
   group_by(state) %>%
   mutate(Rank = dense_rank(-n)) %>% 
   arrange(state, Rank) %>% 
   filter(Rank %in% 1:2)
# A tibble: 18 x 4
# Groups:   state [3]
   name    state     n  Rank
   <chr>   <chr> <int> <int>
 1 Anthony CA        3     1
 2 Anthony CA        3     1
 3 Anthony CA        3     1
 4 Helen   CA        2     2
 5 Helen   CA        2     2
 6 Richard MA        3     1
 7 Richard MA        3     1
 8 Richard MA        3     1
 9 James   MA        2     2
10 James   MA        2     2
11 Smith   NY        3     1
12 Smith   NY        3     1
13 Smith   NY        3     1
14 Conrad  NY        3     1
15 Conrad  NY        3     1
16 Conrad  NY        3     1
17 Andrews NY        2     2
18 Andrews NY        2     2

数据

df1 <- structure(list(name = c("Smith", "Anthony", "James", "Henry", 
"Andrews", "Helen", "Smith", "Smith", "Anthony", "Andrews", "Richard", 
"Richard", "Richard", "Anthony", "Smith", "Jeffries", "Conrad", 
"Hanes", "James", "Conrad", "Conrad", "Helen"), state = c("NY", 
"CA", "MA", "CA", "NY", "CA", "NY", "NY", "CA", "NY", "MA", "MA", 
"MA", "CA", "MA", "CA", "NY", "NY", "MA", "NY", "NY", "CA")),
class = "data.frame", row.names = c(NA, 
-22L))