如何根据另一列中的时间范围创建列,而不管 R 中的日历月
How to create a column based on time ranges in another column irrespective of calender month in R
我之前问过类似的问题,但发现我不够具体。我目前正在用 R 分析从 Twitter 中提取的数据。这些推文来自不同用户在不同时间段内编写(收集每个用户一年内的数据)。我想使用字典绘制数据,但因此我需要统一数据的时间范围。
为简单起见,我创建了两个数据框来解释我在寻找什么。这是我的数据框目前的样子(只有更多数据):
Author <- rep(c("Person1"), times = 7)
Text <- c("A","B","C", "D", "E", "F", "G")
Date <- as.Date(c('2015-01-15','2015-01-23','2015-02-14','2015-02-20', '2015-02-25', '2015-03-04', '2015-04-20'))
Pers1 <- data.frame(Author,Text,Date)
Author <- rep(c("Person2"), times = 7)
Text <- c("H","I","J", "K", "L", "M", "N")
Date <- as.Date(c('2020-08-10','2020-08-15','2020-09-05','2020-09-20', '2020-09-30', '2020-10-15','2020-10-25'))
Pers2 <- data.frame(Author,Text,Date)
DF <- bind_rows(Pers1, Pers2)
例如我正在查看 Person 1 从 2015 年 1 月 15 日到 2016 年 1 月 15 日的推文。观察的第一个月(1 月 15 日到 2 月 15 日)应该称为第一个月,依此类推(直到第 12 个月)
Person2 的观察从 8 月 10 日开始(第一个月到 9 月 10 日,第二个月从 9 月 10 日到 10 月 10 日...)
最后我希望数据框看起来像这样:
> DF
Author Text Date Period
1 Person1 A 2015-01-15 First Month
2 Person1 B 2015-01-23 First Month
3 Person1 C 2015-02-14 First Month
4 Person1 D 2015-02-20 Second Month
5 Person1 E 2015-02-25 Second Month
6 Person1 F 2015-03-04 Second Month
7 Person1 G 2015-04-20 Third Month
8 Person2 H 2020-08-10 First Month
9 Person2 I 2020-08-15 First Month
10 Person2 J 2020-09-05 First Month
11 Person2 K 2020-09-20 Second Month
12 Person2 L 2020-09-30 Second Month
13 Person2 M 2020-10-15 Third Month
14 Person2 N 2020-10-25 Third Month
也许我必须先准备好每个数据框,然后再将它们组合成一个大数据框,但我不知道该怎么做。预先感谢您的所有建议。
你可以这样做:
library(dplyr)
months_since_start <- function(dates, start_date) {
floor(as.numeric(difftime(dates, start_date, unit = "week")) / 4.33) + 1
}
DF %>%
group_by(Author) %>%
mutate(month = months_since_start(Date, first(Date)))
#> # A tibble: 14 x 4
#> # Groups: Author [2]
#> Author Text Date month
#> <chr> <chr> <date> <dbl>
#> 1 Person1 A 2015-01-15 1
#> 2 Person1 B 2015-01-23 1
#> 3 Person1 C 2015-02-14 1
#> 4 Person1 D 2015-02-20 2
#> 5 Person1 E 2015-02-25 2
#> 6 Person1 F 2015-03-04 2
#> 7 Person1 G 2015-04-20 4
#> 8 Person2 H 2020-08-10 1
#> 9 Person2 I 2020-08-15 1
#> 10 Person2 J 2020-09-05 1
#> 11 Person2 K 2020-09-20 2
#> 12 Person2 L 2020-09-30 2
#> 13 Person2 M 2020-10-15 3
#> 14 Person2 N 2020-10-25 3
使用MESS::cumsumbinning
library(dplyr)
DF %>%
group_by(Author) %>%
mutate(Month = MESS::cumsumbinning(c(0,diff(Date - first(Date))), 30, cutwhenpassed = F))
Author Text Date Month
<chr> <chr> <date> <int>
1 Person1 A 2015-01-15 1
2 Person1 B 2015-01-23 1
3 Person1 C 2015-02-14 1
4 Person1 D 2015-02-20 2
5 Person1 E 2015-02-25 2
6 Person1 F 2015-03-04 2
7 Person1 G 2015-04-20 3
8 Person2 H 2020-08-10 1
9 Person2 I 2020-08-15 1
10 Person2 J 2020-09-05 1
11 Person2 K 2020-09-20 2
12 Person2 L 2020-09-30 2
13 Person2 M 2020-10-15 3
14 Person2 N 2020-10-25 3
要获得预期结果,您可以使用 english::ordinal
:
library(english)
library(tidyverse)
library(MESS)
DF %>%
group_by(Author) %>%
mutate(Month = MESS::cumsumbinning(c(0,diff(Date - first(Date))), 30, cutwhenpassed = F) %>%
ordinal() %>%
paste(., "Month") %>%
stringr::str_to_title()
)
Author Text Date Month
<chr> <chr> <date> <chr>
1 Person1 A 2015-01-15 First Month
2 Person1 B 2015-01-23 First Month
3 Person1 C 2015-02-14 First Month
4 Person1 D 2015-02-20 Second Month
5 Person1 E 2015-02-25 Second Month
6 Person1 F 2015-03-04 Second Month
7 Person1 G 2015-04-20 Third Month
8 Person2 H 2020-08-10 First Month
9 Person2 I 2020-08-15 First Month
10 Person2 J 2020-09-05 First Month
11 Person2 K 2020-09-20 Second Month
12 Person2 L 2020-09-30 Second Month
13 Person2 M 2020-10-15 Third Month
14 Person2 N 2020-10-25 Third Month
代码
library(lubridate)
DF %>%
group_by(Author) %>%
mutate(Period = 1 + (interval(first(Date), Date) %/% months(1)))
结果
Author Text Date Period
<fct> <fct> <date> <dbl>
1 Person1 A 2015-01-15 1
2 Person1 B 2015-01-23 1
3 Person1 C 2015-02-14 1
4 Person1 D 2015-02-20 2
5 Person1 E 2015-02-25 2
6 Person1 F 2015-03-04 2
7 Person1 G 2015-04-20 4
8 Person2 H 2020-08-10 1
9 Person2 I 2020-08-15 1
10 Person2 J 2020-09-05 1
11 Person2 K 2020-09-20 2
12 Person2 L 2020-09-30 2
13 Person2 M 2020-10-15 3
14 Person2 N 2020-10-25 3
我之前问过类似的问题,但发现我不够具体。我目前正在用 R 分析从 Twitter 中提取的数据。这些推文来自不同用户在不同时间段内编写(收集每个用户一年内的数据)。我想使用字典绘制数据,但因此我需要统一数据的时间范围。
为简单起见,我创建了两个数据框来解释我在寻找什么。这是我的数据框目前的样子(只有更多数据):
Author <- rep(c("Person1"), times = 7)
Text <- c("A","B","C", "D", "E", "F", "G")
Date <- as.Date(c('2015-01-15','2015-01-23','2015-02-14','2015-02-20', '2015-02-25', '2015-03-04', '2015-04-20'))
Pers1 <- data.frame(Author,Text,Date)
Author <- rep(c("Person2"), times = 7)
Text <- c("H","I","J", "K", "L", "M", "N")
Date <- as.Date(c('2020-08-10','2020-08-15','2020-09-05','2020-09-20', '2020-09-30', '2020-10-15','2020-10-25'))
Pers2 <- data.frame(Author,Text,Date)
DF <- bind_rows(Pers1, Pers2)
例如我正在查看 Person 1 从 2015 年 1 月 15 日到 2016 年 1 月 15 日的推文。观察的第一个月(1 月 15 日到 2 月 15 日)应该称为第一个月,依此类推(直到第 12 个月)
Person2 的观察从 8 月 10 日开始(第一个月到 9 月 10 日,第二个月从 9 月 10 日到 10 月 10 日...)
最后我希望数据框看起来像这样:
> DF
Author Text Date Period
1 Person1 A 2015-01-15 First Month
2 Person1 B 2015-01-23 First Month
3 Person1 C 2015-02-14 First Month
4 Person1 D 2015-02-20 Second Month
5 Person1 E 2015-02-25 Second Month
6 Person1 F 2015-03-04 Second Month
7 Person1 G 2015-04-20 Third Month
8 Person2 H 2020-08-10 First Month
9 Person2 I 2020-08-15 First Month
10 Person2 J 2020-09-05 First Month
11 Person2 K 2020-09-20 Second Month
12 Person2 L 2020-09-30 Second Month
13 Person2 M 2020-10-15 Third Month
14 Person2 N 2020-10-25 Third Month
也许我必须先准备好每个数据框,然后再将它们组合成一个大数据框,但我不知道该怎么做。预先感谢您的所有建议。
你可以这样做:
library(dplyr)
months_since_start <- function(dates, start_date) {
floor(as.numeric(difftime(dates, start_date, unit = "week")) / 4.33) + 1
}
DF %>%
group_by(Author) %>%
mutate(month = months_since_start(Date, first(Date)))
#> # A tibble: 14 x 4
#> # Groups: Author [2]
#> Author Text Date month
#> <chr> <chr> <date> <dbl>
#> 1 Person1 A 2015-01-15 1
#> 2 Person1 B 2015-01-23 1
#> 3 Person1 C 2015-02-14 1
#> 4 Person1 D 2015-02-20 2
#> 5 Person1 E 2015-02-25 2
#> 6 Person1 F 2015-03-04 2
#> 7 Person1 G 2015-04-20 4
#> 8 Person2 H 2020-08-10 1
#> 9 Person2 I 2020-08-15 1
#> 10 Person2 J 2020-09-05 1
#> 11 Person2 K 2020-09-20 2
#> 12 Person2 L 2020-09-30 2
#> 13 Person2 M 2020-10-15 3
#> 14 Person2 N 2020-10-25 3
使用MESS::cumsumbinning
library(dplyr)
DF %>%
group_by(Author) %>%
mutate(Month = MESS::cumsumbinning(c(0,diff(Date - first(Date))), 30, cutwhenpassed = F))
Author Text Date Month
<chr> <chr> <date> <int>
1 Person1 A 2015-01-15 1
2 Person1 B 2015-01-23 1
3 Person1 C 2015-02-14 1
4 Person1 D 2015-02-20 2
5 Person1 E 2015-02-25 2
6 Person1 F 2015-03-04 2
7 Person1 G 2015-04-20 3
8 Person2 H 2020-08-10 1
9 Person2 I 2020-08-15 1
10 Person2 J 2020-09-05 1
11 Person2 K 2020-09-20 2
12 Person2 L 2020-09-30 2
13 Person2 M 2020-10-15 3
14 Person2 N 2020-10-25 3
要获得预期结果,您可以使用 english::ordinal
:
library(english)
library(tidyverse)
library(MESS)
DF %>%
group_by(Author) %>%
mutate(Month = MESS::cumsumbinning(c(0,diff(Date - first(Date))), 30, cutwhenpassed = F) %>%
ordinal() %>%
paste(., "Month") %>%
stringr::str_to_title()
)
Author Text Date Month
<chr> <chr> <date> <chr>
1 Person1 A 2015-01-15 First Month
2 Person1 B 2015-01-23 First Month
3 Person1 C 2015-02-14 First Month
4 Person1 D 2015-02-20 Second Month
5 Person1 E 2015-02-25 Second Month
6 Person1 F 2015-03-04 Second Month
7 Person1 G 2015-04-20 Third Month
8 Person2 H 2020-08-10 First Month
9 Person2 I 2020-08-15 First Month
10 Person2 J 2020-09-05 First Month
11 Person2 K 2020-09-20 Second Month
12 Person2 L 2020-09-30 Second Month
13 Person2 M 2020-10-15 Third Month
14 Person2 N 2020-10-25 Third Month
代码
library(lubridate)
DF %>%
group_by(Author) %>%
mutate(Period = 1 + (interval(first(Date), Date) %/% months(1)))
结果
Author Text Date Period
<fct> <fct> <date> <dbl>
1 Person1 A 2015-01-15 1
2 Person1 B 2015-01-23 1
3 Person1 C 2015-02-14 1
4 Person1 D 2015-02-20 2
5 Person1 E 2015-02-25 2
6 Person1 F 2015-03-04 2
7 Person1 G 2015-04-20 4
8 Person2 H 2020-08-10 1
9 Person2 I 2020-08-15 1
10 Person2 J 2020-09-05 1
11 Person2 K 2020-09-20 2
12 Person2 L 2020-09-30 2
13 Person2 M 2020-10-15 3
14 Person2 N 2020-10-25 3