当范围变化时按月从 df 中提取开始和结束日期

Extract start and end dates from df by month when range varies

我有一个大型数据集,其开始日期和结束日期有时在一个月内,但通常跨越一个月或一年以上。最终,我想统计每个ID每个月的入住天数。

这是示例数据:

ID = c(50:55)
ENTRY = as.Date(c("11/6/2011", "04/08/2012", "10/9/2012",
              "23/10/2012", "15/11/2012", "23/11/2012"), "%d/%m/%Y")
EXIT = as.Date(c("11/7/2011", "06/09/2012", "24/9/2012",
              "31/12/2012", "18/11/2012", "04/01/2013"), "%d/%m/%Y")
Occupancy <- data.frame(ID, ENTRY, EXIT)

ID      ENTRY       EXIT
50 2011-06-11 2011-07-11
51 2012-08-04 2012-09-06
52 2012-09-10 2012-09-24
53 2012-10-23 2012-12-31
54 2012-11-15 2012-11-18
55 2012-11-23 2013-01-04 

这就是我要创建的内容:

ID  ENTRY   EXIT
50  6/11/2011   6/30/2011
50  7/1/2011    7/11/2011
51  8/4/2012    8/31/2012
51  9/1/2012    9/6/2012
:
55  11/23/2012  11/30/2012
55  12/1/2012   12/31/2012
55  1/1/2013    1/4/2013

如有任何建议,我们将不胜感激!

希望对您有所帮助!
它会为您提供最终结果 - 即每个 ID 每个月的占用天数。

ID = c(50:55)
ENTRY = as.Date(c("11/6/2011", "04/08/2012", "10/9/2012",
                  "23/10/2012", "15/11/2012", "23/11/2012"), "%d/%m/%Y")
EXIT = as.Date(c("11/7/2011", "06/09/2012", "24/9/2012",
                 "31/12/2012", "18/11/2012", "04/01/2013"), "%d/%m/%Y")
Occupancy <- data.frame(ID, ENTRY, EXIT)

library(zoo)
library(dplyr)
monthList <- mapply(function(x,y) as.yearmon(seq(x,y, "day")), ENTRY, EXIT)
OccupancyDf <- monthList %>% lapply(table) %>% lapply(as.list) %>% lapply(data.frame) %>% rbind_all()
OccupancyDf$ID <- Occupancy$ID
OccupancyDf[is.na(OccupancyDf)] <- 0
OccupancyDf

输出为:

Jun.2011 Jul.2011 Aug.2012 Sep.2012 Oct.2012 Nov.2012 Dec.2012 Jan.2013    ID
      20       11        0        0        0        0        0        0    50
       0        0       28        6        0        0        0        0    51
       0        0        0       15        0        0        0        0    52
       0        0        0        0        9       30       31        0    53
       0        0        0        0        0        4        0        0    54
       0        0        0        0        0        8       31        4    55


如果它解决了您的问题,请不要忘记告诉我们:)

这是获取您显示的输出的方法

以下函数将接受单行数据框(ENTRYEXIT)和 return 每个月分解的数据框。

custom.dates <- function(a,ts) {
                 if (ts > 0) {
                    newdates <- lapply(1:ts, function(x)  a$ENTRY + period(x,"month"))
                    new.entry <- lapply(1:ts, function(x) { ymd(paste(year(newdates[[x]]), month(newdates[[x]]), "01", sep="-")) } )
                    newdates <- lapply((ts-1):0, function(x) a$ENTRY + period(x,"month"))
                    new.exit <- lapply(ts:1, function(x) { ymd(paste(year(newdates[[x]]), month(newdates[[x]]), days_in_month(month(newdates[[x]])), sep="-")) } )
                  df <- data.frame(ENTRY=sort(c(a$ENTRY,new.entry)), EXIT=sort(c(a$EXIT,new.exit))) 
              return(df)
                 } else {
                    return(a)
                 }
            }

使用tidyverse

library(tidyverse)
result <- Occupancy %>%
        mutate(monthspan = (year(EXIT)*12 + month(EXIT)) - (year(ENTRY)*12 + month(ENTRY)) ) %>%
        nest(monthspan, ENTRY, EXIT) %>%
        mutate(data =  map(data, ~custom.dates(select(.x, -monthspan), .x$monthspan))) %>%
        unnest(data)

输出

     ID      ENTRY       EXIT
 1    50 2011-06-11 2011-06-30
 2    50 2011-07-01 2011-07-11
 3    51 2012-08-04 2012-08-31
 4    51 2012-09-01 2012-09-06
 5    52 2012-09-10 2012-09-24
 6    53 2012-10-23 2012-10-31
 7    53 2012-11-01 2012-11-30
 8    53 2012-12-01 2012-12-31
 9    54 2012-11-15 2012-11-18
10    55 2012-11-23 2012-11-30
11    55 2012-12-01 2012-12-31
12    55 2013-01-01 2013-01-04