"group_by" 和 "mutate" 的问题
Issues with "group_by" and "mutate"
我有以下数据框:
df = structure(list(Day = c(19L, 20L, 20L, 20L, 20L, 21L, 21L, 21L,
21L, 21L), Month = c(9, 9, 9, 9, 9, 9, 9, 9, 9, 9), Year = c(2004,
2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004), Date = c("2004-09-19",
"2004-09-20", "2004-09-20", "2004-09-20", "2004-09-20", "2004-09-21",
"2004-09-21", "2004-09-21", "2004-09-21", "2004-09-21"), Outlet = c("Le Monde",
"Financial Times", "Corriere della Sera", "Frankfurter Allgemeine Zeitung",
"El Pais", "Financial Times", "La Tribune", "Financial Times",
"Borsen-Zeitung", "Borsen-Zeitung"), Country = c("France", "International",
"Italy", "Germany", "Spain", "Germany", "France", "Germany",
"Germany", "Germany")), row.names = c("text1", "text2", "text3",
"text4", "text5", "text6", "text7", "text8", "text9", "text10"
), class = "data.frame")
Day Month Year Date Outlet Country
text1 19 9 2004 2004-09-19 Le Monde France
text2 20 9 2004 2004-09-20 Financial Times International
text3 20 9 2004 2004-09-20 Corriere della Sera Italy
text4 20 9 2004 2004-09-20 Frankfurter Allgemeine Zeitung Germany
text5 20 9 2004 2004-09-20 El Pais Spain
text6 21 9 2004 2004-09-21 Financial Times Germany
text7 21 9 2004 2004-09-21 La Tribune France
text8 21 9 2004 2004-09-21 Financial Times Germany
text9 21 9 2004 2004-09-21 Borsen-Zeitung Germany
text10 21 9 2004 2004-09-21 Borsen-Zeitung Germany
我想为具有相同日期的每一行创建一个索引,如下所示:
Day Month Year Date Outlet Country ID
text1 19 9 2004 2004-09-19 Le Monde France 1
text2 20 9 2004 2004-09-20 Financial Times International 2
text3 20 9 2004 2004-09-20 Corriere della Sera Italy 2
text4 20 9 2004 2004-09-20 Frankfurter Allgemeine Zeitung Germany 2
text5 20 9 2004 2004-09-20 El Pais Spain 2
text6 21 9 2004 2004-09-21 Financial Times Germany 3
text7 21 9 2004 2004-09-21 La Tribune France 3
text8 21 9 2004 2004-09-21 Financial Times Germany 3
text9 21 9 2004 2004-09-21 Borsen-Zeitung Germany 3
text10 21 9 2004 2004-09-21 Borsen-Zeitung Germany 3
为此,我这样做:
library(dplyr)
df %>% group_by(Date) %>% mutate(id = row_number())
Day Month Year Date Outlet Country id
1 19 9 2004 2004-09-19 Le Monde France 1
2 20 9 2004 2004-09-20 Financial Times International 1
3 20 9 2004 2004-09-20 Corriere della Sera Italy 2
4 20 9 2004 2004-09-20 Frankfurter Allgemeine Zeitung Germany 3
...
然而,它不起作用。我不明白为什么。
谁能帮我解决这个问题?
非常感谢!
按'Date'分组后我们可以使用cur_group_id
library(dplyr)
df %>%
group_by(Date) %>%
mutate(ID = cur_group_id()) %>%
ungroup
或者另一种选择是 match
不分组
df %>%
mutate(ID = match(Date, unique(Date)))
或使用base R
df$ID <- with(df, match(Date, unique(Date)))
使用rleid
df %>%
mutate(id = data.table::rleid(Date))
Day Month Year Date Outlet Country id
1 19 9 2004 2004-09-19 Le Monde France 1
2 20 9 2004 2004-09-20 Financial Times International 2
3 20 9 2004 2004-09-20 Corriere della Sera Italy 2
4 20 9 2004 2004-09-20 Frankfurter Allgemeine Zeitung Germany 2
5 20 9 2004 2004-09-20 El Pais Spain 2
6 21 9 2004 2004-09-21 Financial Times Germany 3
7 21 9 2004 2004-09-21 La Tribune France 3
8 21 9 2004 2004-09-21 Financial Times Germany 3
9 21 9 2004 2004-09-21 Borsen-Zeitung Germany 3
10 21 9 2004 2004-09-21 Borsen-Zeitung Germany 3
我觉得没必要申请group_by。只需这样做,使用 dense_rank
mutate(df, ID = dense_rank(Date))
> mutate(df, ID = dense_rank(Date))
Day Month Year Date Outlet Country ID
1 19 9 2004 2004-09-19 Le Monde France 1
2 20 9 2004 2004-09-20 Financial Times International 2
3 20 9 2004 2004-09-20 Corriere della Sera Italy 2
4 20 9 2004 2004-09-20 Frankfurter Allgemeine Zeitung Germany 2
5 20 9 2004 2004-09-20 El Pais Spain 2
6 21 9 2004 2004-09-21 Financial Times Germany 3
7 21 9 2004 2004-09-21 La Tribune France 3
8 21 9 2004 2004-09-21 Financial Times Germany 3
9 21 9 2004 2004-09-21 Borsen-Zeitung Germany 3
10 21 9 2004 2004-09-21 Borsen-Zeitung Germany 3
row_number 创建组内所有行的索引,而 dense_rank 为每个组分配一个唯一索引
我有以下数据框:
df = structure(list(Day = c(19L, 20L, 20L, 20L, 20L, 21L, 21L, 21L,
21L, 21L), Month = c(9, 9, 9, 9, 9, 9, 9, 9, 9, 9), Year = c(2004,
2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004), Date = c("2004-09-19",
"2004-09-20", "2004-09-20", "2004-09-20", "2004-09-20", "2004-09-21",
"2004-09-21", "2004-09-21", "2004-09-21", "2004-09-21"), Outlet = c("Le Monde",
"Financial Times", "Corriere della Sera", "Frankfurter Allgemeine Zeitung",
"El Pais", "Financial Times", "La Tribune", "Financial Times",
"Borsen-Zeitung", "Borsen-Zeitung"), Country = c("France", "International",
"Italy", "Germany", "Spain", "Germany", "France", "Germany",
"Germany", "Germany")), row.names = c("text1", "text2", "text3",
"text4", "text5", "text6", "text7", "text8", "text9", "text10"
), class = "data.frame")
Day Month Year Date Outlet Country
text1 19 9 2004 2004-09-19 Le Monde France
text2 20 9 2004 2004-09-20 Financial Times International
text3 20 9 2004 2004-09-20 Corriere della Sera Italy
text4 20 9 2004 2004-09-20 Frankfurter Allgemeine Zeitung Germany
text5 20 9 2004 2004-09-20 El Pais Spain
text6 21 9 2004 2004-09-21 Financial Times Germany
text7 21 9 2004 2004-09-21 La Tribune France
text8 21 9 2004 2004-09-21 Financial Times Germany
text9 21 9 2004 2004-09-21 Borsen-Zeitung Germany
text10 21 9 2004 2004-09-21 Borsen-Zeitung Germany
我想为具有相同日期的每一行创建一个索引,如下所示:
Day Month Year Date Outlet Country ID
text1 19 9 2004 2004-09-19 Le Monde France 1
text2 20 9 2004 2004-09-20 Financial Times International 2
text3 20 9 2004 2004-09-20 Corriere della Sera Italy 2
text4 20 9 2004 2004-09-20 Frankfurter Allgemeine Zeitung Germany 2
text5 20 9 2004 2004-09-20 El Pais Spain 2
text6 21 9 2004 2004-09-21 Financial Times Germany 3
text7 21 9 2004 2004-09-21 La Tribune France 3
text8 21 9 2004 2004-09-21 Financial Times Germany 3
text9 21 9 2004 2004-09-21 Borsen-Zeitung Germany 3
text10 21 9 2004 2004-09-21 Borsen-Zeitung Germany 3
为此,我这样做:
library(dplyr)
df %>% group_by(Date) %>% mutate(id = row_number())
Day Month Year Date Outlet Country id
1 19 9 2004 2004-09-19 Le Monde France 1
2 20 9 2004 2004-09-20 Financial Times International 1
3 20 9 2004 2004-09-20 Corriere della Sera Italy 2
4 20 9 2004 2004-09-20 Frankfurter Allgemeine Zeitung Germany 3
...
然而,它不起作用。我不明白为什么。
谁能帮我解决这个问题?
非常感谢!
按'Date'分组后我们可以使用cur_group_id
library(dplyr)
df %>%
group_by(Date) %>%
mutate(ID = cur_group_id()) %>%
ungroup
或者另一种选择是 match
不分组
df %>%
mutate(ID = match(Date, unique(Date)))
或使用base R
df$ID <- with(df, match(Date, unique(Date)))
使用rleid
df %>%
mutate(id = data.table::rleid(Date))
Day Month Year Date Outlet Country id
1 19 9 2004 2004-09-19 Le Monde France 1
2 20 9 2004 2004-09-20 Financial Times International 2
3 20 9 2004 2004-09-20 Corriere della Sera Italy 2
4 20 9 2004 2004-09-20 Frankfurter Allgemeine Zeitung Germany 2
5 20 9 2004 2004-09-20 El Pais Spain 2
6 21 9 2004 2004-09-21 Financial Times Germany 3
7 21 9 2004 2004-09-21 La Tribune France 3
8 21 9 2004 2004-09-21 Financial Times Germany 3
9 21 9 2004 2004-09-21 Borsen-Zeitung Germany 3
10 21 9 2004 2004-09-21 Borsen-Zeitung Germany 3
我觉得没必要申请group_by。只需这样做,使用 dense_rank
mutate(df, ID = dense_rank(Date))
> mutate(df, ID = dense_rank(Date))
Day Month Year Date Outlet Country ID
1 19 9 2004 2004-09-19 Le Monde France 1
2 20 9 2004 2004-09-20 Financial Times International 2
3 20 9 2004 2004-09-20 Corriere della Sera Italy 2
4 20 9 2004 2004-09-20 Frankfurter Allgemeine Zeitung Germany 2
5 20 9 2004 2004-09-20 El Pais Spain 2
6 21 9 2004 2004-09-21 Financial Times Germany 3
7 21 9 2004 2004-09-21 La Tribune France 3
8 21 9 2004 2004-09-21 Financial Times Germany 3
9 21 9 2004 2004-09-21 Borsen-Zeitung Germany 3
10 21 9 2004 2004-09-21 Borsen-Zeitung Germany 3
row_number 创建组内所有行的索引,而 dense_rank 为每个组分配一个唯一索引