要按名称分组并按外观排名并添加计数,同时消除每个州(降序)中前 2 名中不相关的名称?
To group by name and rank by appearance and add count while eliminating names not associated in top 2 within each state(descending)?
例如我有一个看起来像这样的数据集
name | state
Smith NY
Anthony CA
James MA
Henry CA
Andrews NY
Helen CA
Smith NY
Smith NY
Anthony CA
Andrews NY
Richard MA
Richard MA
Richard MA
Anthony CA
Smith MA
Jeffries CA
Conrad NY
Hanes NY
James MA
Conrad NY
Conrad NY
Helen CA
最后我想要这样的东西。请注意,州是按字母顺序排列的。请注意,出现次数最多的名称显示在顶部,出现次多的名称显示在其后。我只 select 每个分组(状态)中的前两个,然后我创建这些列以引用它们的排名并根据行外观进行计数。
name| state| Rank | Count
Anthony CA 1 3
Anthony CA 1 3
Anthony CA 1 3
Helen CA 2 2
Helen CA 2 2
Richard MA 1 3
Richard MA 1 3
Richard MA 1 3
James MA 2 2
James MA 2 2
Smith NY 1 3
Smith NY 1 3
Smith NY 1 3
Conrad NY 1 3
Conrad NY 1 3
Conrad NY 1 3
也许这有帮助
library(dplyr)
df1 %>%
add_count(name, state) %>%
group_by(state) %>%
mutate(Rank = dense_rank(-n)) %>%
arrange(state, Rank) %>%
filter(Rank %in% 1:2)
# A tibble: 18 x 4
# Groups: state [3]
name state n Rank
<chr> <chr> <int> <int>
1 Anthony CA 3 1
2 Anthony CA 3 1
3 Anthony CA 3 1
4 Helen CA 2 2
5 Helen CA 2 2
6 Richard MA 3 1
7 Richard MA 3 1
8 Richard MA 3 1
9 James MA 2 2
10 James MA 2 2
11 Smith NY 3 1
12 Smith NY 3 1
13 Smith NY 3 1
14 Conrad NY 3 1
15 Conrad NY 3 1
16 Conrad NY 3 1
17 Andrews NY 2 2
18 Andrews NY 2 2
数据
df1 <- structure(list(name = c("Smith", "Anthony", "James", "Henry",
"Andrews", "Helen", "Smith", "Smith", "Anthony", "Andrews", "Richard",
"Richard", "Richard", "Anthony", "Smith", "Jeffries", "Conrad",
"Hanes", "James", "Conrad", "Conrad", "Helen"), state = c("NY",
"CA", "MA", "CA", "NY", "CA", "NY", "NY", "CA", "NY", "MA", "MA",
"MA", "CA", "MA", "CA", "NY", "NY", "MA", "NY", "NY", "CA")),
class = "data.frame", row.names = c(NA,
-22L))
例如我有一个看起来像这样的数据集
name | state
Smith NY
Anthony CA
James MA
Henry CA
Andrews NY
Helen CA
Smith NY
Smith NY
Anthony CA
Andrews NY
Richard MA
Richard MA
Richard MA
Anthony CA
Smith MA
Jeffries CA
Conrad NY
Hanes NY
James MA
Conrad NY
Conrad NY
Helen CA
最后我想要这样的东西。请注意,州是按字母顺序排列的。请注意,出现次数最多的名称显示在顶部,出现次多的名称显示在其后。我只 select 每个分组(状态)中的前两个,然后我创建这些列以引用它们的排名并根据行外观进行计数。
name| state| Rank | Count
Anthony CA 1 3
Anthony CA 1 3
Anthony CA 1 3
Helen CA 2 2
Helen CA 2 2
Richard MA 1 3
Richard MA 1 3
Richard MA 1 3
James MA 2 2
James MA 2 2
Smith NY 1 3
Smith NY 1 3
Smith NY 1 3
Conrad NY 1 3
Conrad NY 1 3
Conrad NY 1 3
也许这有帮助
library(dplyr)
df1 %>%
add_count(name, state) %>%
group_by(state) %>%
mutate(Rank = dense_rank(-n)) %>%
arrange(state, Rank) %>%
filter(Rank %in% 1:2)
# A tibble: 18 x 4
# Groups: state [3]
name state n Rank
<chr> <chr> <int> <int>
1 Anthony CA 3 1
2 Anthony CA 3 1
3 Anthony CA 3 1
4 Helen CA 2 2
5 Helen CA 2 2
6 Richard MA 3 1
7 Richard MA 3 1
8 Richard MA 3 1
9 James MA 2 2
10 James MA 2 2
11 Smith NY 3 1
12 Smith NY 3 1
13 Smith NY 3 1
14 Conrad NY 3 1
15 Conrad NY 3 1
16 Conrad NY 3 1
17 Andrews NY 2 2
18 Andrews NY 2 2
数据
df1 <- structure(list(name = c("Smith", "Anthony", "James", "Henry",
"Andrews", "Helen", "Smith", "Smith", "Anthony", "Andrews", "Richard",
"Richard", "Richard", "Anthony", "Smith", "Jeffries", "Conrad",
"Hanes", "James", "Conrad", "Conrad", "Helen"), state = c("NY",
"CA", "MA", "CA", "NY", "CA", "NY", "NY", "CA", "NY", "MA", "MA",
"MA", "CA", "MA", "CA", "NY", "NY", "MA", "NY", "NY", "CA")),
class = "data.frame", row.names = c(NA,
-22L))