r中每年两个日期之间的月数

Number of months between two dates for each year in r

我正在处理如下所示的数据:

name       start_date        end_date
 A         1993-06-25       1993-11-04
 B         2003-12-12       2004-07-20
 C         1997-06-11       2000-11-27
 D         1990-06-29       1992-07-02

我想统计名字跨越的每一年的月数。

所以数据看起来像这样:

name  year number_months
A     1993    5
B     2003    1
B     2004    7
C     1997    6
C     1998   12
C     1999   12
C     2000   11
D     1990    6
D     1991   12
D     1992    7

这是一个可重现的例子:

name <- c("A", "B", "C", "D")
start_date <- as.Date(c("1993-06-25", "2003-12-12", "1997-06-11", "1990-06-29"))
end_date <- as.Date(c("1993-11-04", "2004-07-20", "2000-11-27", "1992-07-02"))

df <- data.frame(name, start_date, end_date)

tidyverse

中的选项
library(dplyr)
library(tidyr)
library(purrr)
library(tibble)
library(lubridate)
df %>% 
   transmute(name,  out = map2(start_date, end_date,
     ~ seq(.x, .y, by = 'months') %>% 
           year %>%
           table %>%
           enframe(name = 'year', value = 'number_months'))) %>% 
   unnest(c(out))

interval

的另一个选项
df %>% 
     transmute(name,  out = map2(start_date, end_date,
          ~ tibble(date = seq(.x, .y, by = 'months'), year = year(date)) %>%
               group_by(year) %>%
               summarise(number_months = interval(floor_date(first(date), 'month'), 
                   ceiling_date(last(date), 'month')) %/% months(1)) )) %>%
     unnest(c(out))

或在base R(扩展@rawr 的解决方案)

do.call(rbind, Map(function(x, y, z) 
 cbind(name = z, stack(table(format(seq(x, y, by = 'months'), 
  '%Y')))), df$start_date, df$end_date, df$name))

或者如@rawr 所言,将 stack 替换为 data.frame 也可以使用

do.call(rbind, Map(function(x, y, z) 
     cbind(name = z, data.frame(table(format(seq(x, y, by = 'months'), 
       '%Y')))), df$start_date, df$end_date, df$name))