"group_by" 和 "mutate" 的问题

Question

我有以下数据框：

df = structure(list(Day = c(19L, 20L, 20L, 20L, 20L, 21L, 21L, 21L, 
21L, 21L), Month = c(9, 9, 9, 9, 9, 9, 9, 9, 9, 9), Year = c(2004, 
2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004, 2004), Date = c("2004-09-19", 
"2004-09-20", "2004-09-20", "2004-09-20", "2004-09-20", "2004-09-21", 
"2004-09-21", "2004-09-21", "2004-09-21", "2004-09-21"), Outlet = c("Le Monde", 
"Financial Times", "Corriere della Sera", "Frankfurter Allgemeine Zeitung", 
"El Pais", "Financial Times", "La Tribune", "Financial Times", 
"Borsen-Zeitung", "Borsen-Zeitung"), Country = c("France", "International", 
"Italy", "Germany", "Spain", "Germany", "France", "Germany", 
"Germany", "Germany")), row.names = c("text1", "text2", "text3", 
"text4", "text5", "text6", "text7", "text8", "text9", "text10"
), class = "data.frame")

       Day Month Year       Date                         Outlet       Country
text1   19     9 2004 2004-09-19                       Le Monde        France
text2   20     9 2004 2004-09-20                Financial Times International
text3   20     9 2004 2004-09-20            Corriere della Sera         Italy
text4   20     9 2004 2004-09-20 Frankfurter Allgemeine Zeitung       Germany
text5   20     9 2004 2004-09-20                        El Pais         Spain
text6   21     9 2004 2004-09-21                Financial Times       Germany
text7   21     9 2004 2004-09-21                     La Tribune        France
text8   21     9 2004 2004-09-21                Financial Times       Germany
text9   21     9 2004 2004-09-21                 Borsen-Zeitung       Germany
text10  21     9 2004 2004-09-21                 Borsen-Zeitung       Germany

我想为具有相同日期的每一行创建一个索引，如下所示：


       Day Month Year       Date                         Outlet       Country   ID
text1   19     9 2004 2004-09-19                       Le Monde        France    1
text2   20     9 2004 2004-09-20                Financial Times International    2
text3   20     9 2004 2004-09-20            Corriere della Sera         Italy    2
text4   20     9 2004 2004-09-20 Frankfurter Allgemeine Zeitung       Germany    2
text5   20     9 2004 2004-09-20                        El Pais         Spain    2
text6   21     9 2004 2004-09-21                Financial Times       Germany    3
text7   21     9 2004 2004-09-21                     La Tribune        France    3
text8   21     9 2004 2004-09-21                Financial Times       Germany    3
text9   21     9 2004 2004-09-21                 Borsen-Zeitung       Germany    3
text10  21     9 2004 2004-09-21                 Borsen-Zeitung       Germany    3

为此，我这样做：

library(dplyr)

df %>% group_by(Date) %>% mutate(id = row_number())

      Day Month  Year Date       Outlet                         Country          id
       
 1    19     9  2004 2004-09-19 Le Monde                       France            1
 2    20     9  2004 2004-09-20 Financial Times                International     1
 3    20     9  2004 2004-09-20 Corriere della Sera            Italy             2
 4    20     9  2004 2004-09-20 Frankfurter Allgemeine Zeitung Germany           3
...

然而，它不起作用。我不明白为什么。

谁能帮我解决这个问题？

非常感谢！

Answer 1

按'Date'分组后我们可以使用cur_group_id

library(dplyr)
df %>%
    group_by(Date) %>% 
    mutate(ID = cur_group_id()) %>%
    ungroup

或者另一种选择是 match 不分组

df %>%
   mutate(ID = match(Date, unique(Date)))

或使用base R

df$ID <- with(df, match(Date, unique(Date)))

Answer 2

使用rleid

df %>% 
  mutate(id = data.table::rleid(Date))

   Day Month Year       Date                         Outlet       Country id
1   19     9 2004 2004-09-19                       Le Monde        France  1
2   20     9 2004 2004-09-20                Financial Times International  2
3   20     9 2004 2004-09-20            Corriere della Sera         Italy  2
4   20     9 2004 2004-09-20 Frankfurter Allgemeine Zeitung       Germany  2
5   20     9 2004 2004-09-20                        El Pais         Spain  2
6   21     9 2004 2004-09-21                Financial Times       Germany  3
7   21     9 2004 2004-09-21                     La Tribune        France  3
8   21     9 2004 2004-09-21                Financial Times       Germany  3
9   21     9 2004 2004-09-21                 Borsen-Zeitung       Germany  3
10  21     9 2004 2004-09-21                 Borsen-Zeitung       Germany  3

Answer 3

我觉得没必要申请group_by。只需这样做，使用 dense_rank

mutate(df, ID = dense_rank(Date))

> mutate(df, ID = dense_rank(Date))
   Day Month Year       Date                         Outlet       Country ID
1   19     9 2004 2004-09-19                       Le Monde        France  1
2   20     9 2004 2004-09-20                Financial Times International  2
3   20     9 2004 2004-09-20            Corriere della Sera         Italy  2
4   20     9 2004 2004-09-20 Frankfurter Allgemeine Zeitung       Germany  2
5   20     9 2004 2004-09-20                        El Pais         Spain  2
6   21     9 2004 2004-09-21                Financial Times       Germany  3
7   21     9 2004 2004-09-21                     La Tribune        France  3
8   21     9 2004 2004-09-21                Financial Times       Germany  3
9   21     9 2004 2004-09-21                 Borsen-Zeitung       Germany  3
10  21     9 2004 2004-09-21                 Borsen-Zeitung       Germany  3

row_number 创建组内所有行的索引，而 dense_rank 为每个组分配一个唯一索引

"group_by" 和 "mutate" 的问题

Issues with "group_by" and "mutate"

r

dense-rank

dataframe

dplyr