如何根据另一列中的时间范围创建列,而不管 R 中的日历月

How to create a column based on time ranges in another column irrespective of calender month in R

我之前问过类似的问题,但发现我不够具体。我目前正在用 R 分析从 Twitter 中提取的数据。这些推文来自不同用户在不同时间段内编写(收集每个用户一年内的数据)。我想使用字典绘制数据,但因此我需要统一数据的时间范围。

为简单起见,我创建了两个数据框来解释我在寻找什么。这是我的数据框目前的样子(只有更多数据):

Author <- rep(c("Person1"), times = 7)
Text <- c("A","B","C", "D", "E", "F", "G")
Date <- as.Date(c('2015-01-15','2015-01-23','2015-02-14','2015-02-20', '2015-02-25', '2015-03-04', '2015-04-20'))
Pers1 <- data.frame(Author,Text,Date)

Author <- rep(c("Person2"), times = 7)
Text <- c("H","I","J", "K", "L", "M", "N")
Date <- as.Date(c('2020-08-10','2020-08-15','2020-09-05','2020-09-20', '2020-09-30', '2020-10-15','2020-10-25'))
Pers2 <- data.frame(Author,Text,Date)

DF <- bind_rows(Pers1, Pers2)

例如我正在查看 Person 1 从 2015 年 1 月 15 日到 2016 年 1 月 15 日的推文。观察的第一个月(1 月 15 日到 2 月 15 日)应该称为第一个月,依此类推(直到第 12 个月)

Person2 的观察从 8 月 10 日开始(第一个月到 9 月 10 日,第二个月从 9 月 10 日到 10 月 10 日...)

最后我希望数据框看起来像这样:

> DF
    Author Text       Date       Period
1  Person1    A 2015-01-15  First Month
2  Person1    B 2015-01-23  First Month
3  Person1    C 2015-02-14  First Month
4  Person1    D 2015-02-20 Second Month
5  Person1    E 2015-02-25 Second Month
6  Person1    F 2015-03-04 Second Month
7  Person1    G 2015-04-20  Third Month
8  Person2    H 2020-08-10  First Month
9  Person2    I 2020-08-15  First Month
10 Person2    J 2020-09-05  First Month
11 Person2    K 2020-09-20 Second Month
12 Person2    L 2020-09-30 Second Month
13 Person2    M 2020-10-15  Third Month
14 Person2    N 2020-10-25  Third Month

也许我必须先准备好每个数据框,然后再将它们组合成一个大数据框,但我不知道该怎么做。预先感谢您的所有建议。

你可以这样做:

library(dplyr)

months_since_start <- function(dates, start_date) {
  floor(as.numeric(difftime(dates, start_date, unit = "week")) / 4.33) + 1
}

DF %>% 
  group_by(Author) %>% 
  mutate(month = months_since_start(Date, first(Date)))

#> # A tibble: 14 x 4
#> # Groups:   Author [2]
#>    Author  Text  Date       month
#>    <chr>   <chr> <date>     <dbl>
#>  1 Person1 A     2015-01-15     1
#>  2 Person1 B     2015-01-23     1
#>  3 Person1 C     2015-02-14     1
#>  4 Person1 D     2015-02-20     2
#>  5 Person1 E     2015-02-25     2
#>  6 Person1 F     2015-03-04     2
#>  7 Person1 G     2015-04-20     4
#>  8 Person2 H     2020-08-10     1
#>  9 Person2 I     2020-08-15     1
#> 10 Person2 J     2020-09-05     1
#> 11 Person2 K     2020-09-20     2
#> 12 Person2 L     2020-09-30     2
#> 13 Person2 M     2020-10-15     3
#> 14 Person2 N     2020-10-25     3

使用MESS::cumsumbinning

library(dplyr)

DF %>% 
  group_by(Author) %>% 
  mutate(Month = MESS::cumsumbinning(c(0,diff(Date - first(Date))), 30, cutwhenpassed = F))

   Author  Text  Date       Month
   <chr>   <chr> <date>     <int>
 1 Person1 A     2015-01-15     1
 2 Person1 B     2015-01-23     1
 3 Person1 C     2015-02-14     1
 4 Person1 D     2015-02-20     2
 5 Person1 E     2015-02-25     2
 6 Person1 F     2015-03-04     2
 7 Person1 G     2015-04-20     3
 8 Person2 H     2020-08-10     1
 9 Person2 I     2020-08-15     1
10 Person2 J     2020-09-05     1
11 Person2 K     2020-09-20     2
12 Person2 L     2020-09-30     2
13 Person2 M     2020-10-15     3
14 Person2 N     2020-10-25     3

要获得预期结果,您可以使用 english::ordinal:

library(english)
library(tidyverse)
library(MESS)
DF %>% 
  group_by(Author) %>% 
  mutate(Month = MESS::cumsumbinning(c(0,diff(Date - first(Date))), 30, cutwhenpassed = F) %>% 
                    ordinal() %>% 
                    paste(., "Month") %>% 
                    stringr::str_to_title()
         )

   Author  Text  Date       Month       
   <chr>   <chr> <date>     <chr>       
 1 Person1 A     2015-01-15 First Month 
 2 Person1 B     2015-01-23 First Month 
 3 Person1 C     2015-02-14 First Month 
 4 Person1 D     2015-02-20 Second Month
 5 Person1 E     2015-02-25 Second Month
 6 Person1 F     2015-03-04 Second Month
 7 Person1 G     2015-04-20 Third Month 
 8 Person2 H     2020-08-10 First Month 
 9 Person2 I     2020-08-15 First Month 
10 Person2 J     2020-09-05 First Month 
11 Person2 K     2020-09-20 Second Month
12 Person2 L     2020-09-30 Second Month
13 Person2 M     2020-10-15 Third Month 
14 Person2 N     2020-10-25 Third Month 

代码

library(lubridate)

DF %>%
  group_by(Author) %>%
  mutate(Period = 1 + (interval(first(Date), Date) %/% months(1)))

结果

   Author  Text  Date       Period
   <fct>   <fct> <date>      <dbl>
 1 Person1 A     2015-01-15      1
 2 Person1 B     2015-01-23      1
 3 Person1 C     2015-02-14      1
 4 Person1 D     2015-02-20      2
 5 Person1 E     2015-02-25      2
 6 Person1 F     2015-03-04      2
 7 Person1 G     2015-04-20      4
 8 Person2 H     2020-08-10      1
 9 Person2 I     2020-08-15      1
10 Person2 J     2020-09-05      1
11 Person2 K     2020-09-20      2
12 Person2 L     2020-09-30      2
13 Person2 M     2020-10-15      3
14 Person2 N     2020-10-25      3