使用日期索引和列格式化数据框以将行值合并到每个月类别中

Format Data Frame with Date Index and Column to Consolidate Row Values into Each Month Category

我有一个 data.frame,它有一个日期索引,它在第一列中被复制,并且有许多不同的列和包含数据的相应行。因为包含数据(按日​​期编制索引)的行可能会有所不同,具体取决于是否收集了相应的数据,有时可能会有一行(日期)包含许多空白并且只有几列的值。我想将这些行折叠成一个月和一年。

即一行可以显示昨天的数据,但不是今天所有列的数据,因为它可能尚未收集。只是看起来美观混乱,宁愿只说 "June-2020" 并折叠它们以删除 NA。

这是 data.frame 的输入:

structure(list(Date = structure(c(18292, 18320, 18321, 18351, 
18352, 18382, 18413, 18427, 18428), tzone = "UTC", tclass = "Date", class = "Date"), 
    `M-o-M Change in Median Rent - AMH GA` = c(0, NA, 0, NA, 
    0, 0, 0, NA, 0), `Median Advertised Rent - AMH GA` = c(1695, 
    NA, 1695, NA, 1695, 1695, 1695, NA, 1695), `Median of Daily Rental Listings Available - AMH GA` = c(438, 
    NA, 430.5, NA, 450, 458, 385, NA, 331), `M-o-M Change in Median Rent - AMH Charlotte` = c(0, 
    NA, 0, NA, 0, 0.0272727272727273, 0, NA, 0.0324483775811208
    ), `Median Advertised Rent - AMH Charlotte` = c(1650, NA, 
    1650, NA, 1650, 1695, 1695, NA, 1750), `Median of Daily Rental Listings Available - AMH Charlotte` = c(244, 
    NA, 257, NA, 256, 270, 227, NA, 220), `M-o-M Change in Median Rent - AMH Dallas` = c(0, 
    NA, 0, NA, 0, 0, 0.0306406685236769, NA, 0), `Median Advertised Rent - AMH Dallas` = c(1795, 
    NA, 1795, NA, 1795, 1795, 1850, NA, 1850), `Median of Daily Rental Listings Available - AMH Dallas` = c(148, 
    NA, 150, NA, 166, 152.5, 131, NA, 135), `M-o-M Change in Median Rent - AMH Houston` = c(0, 
    NA, 0.0272727272727273, NA, 0, 0, 0, NA, 0), `Median Advertised Rent - AMH Houston` = c(1650, 
    NA, 1695, NA, 1695, 1695, 1695, NA, 1695), `Median of Daily Rental Listings Available - AMH Houston` = c(222, 
    NA, 223, NA, 228, 237.5, 203, NA, 189), `M-o-M Change in Median Rent - AMH Jacksonville` = c(0, 
    0.00681818181818183, NA, NA, -0.00677200902934538, 0, 0.0272727272727273, 
    NA, 0), `Median Advertised Rent - AMH Jacksonville` = c(1650, 
    1661.25, NA, NA, 1650, 1650, 1695, NA, 1695), `Median of Daily Rental Listings Available - AMH Jacksonville` = c(164, 
    179.5, NA, NA, 188, 195.5, 185, NA, 174), `M-o-M Change in Median Rent - AMH NC` = c(0, 
    NA, 0.0344827586206897, NA, 0, 0, 0.0272727272727273, NA, 
    0), `Median Advertised Rent - AMH NC` = c(1595, NA, 1650, 
    NA, 1650, 1650, 1695, NA, 1695), `Median of Daily Rental Listings Available - AMH NC` = c(365, 
    NA, 387, NA, 405, 447, 344, NA, 323), `M-o-M Change in Median Rent - AMH NV` = c(0, 
    NA, 0, 0, NA, -0.0265486725663717, 0, NA, 0.0272727272727273
    ), `Median Advertised Rent - AMH NV` = c(1695, NA, 1695, 
    1695, NA, 1650, 1650, NA, 1695), `Median of Daily Rental Listings Available - AMH NV` = c(62, 
    NA, 59, 70, NA, 71, 63, NA, 58), `M-o-M Change in Median Rent - AMH Orlando` = c(0, 
    0.0112676056338028, NA, NA, 0, 0, 0, NA, 0.0306406685236769
    ), `Median Advertised Rent - AMH Orlando` = c(1775, 1795, 
    NA, NA, 1795, 1795, 1795, NA, 1850), `Median of Daily Rental Listings Available - AMH Orlando` = c(82, 
    91.5, NA, NA, 106, 119, 117, NA, 105), `M-o-M Change in Median Rent - AMH Phoenix` = c(0, 
    NA, 0.0216172938350681, NA, 0.0258620689655173, -0.0252100840336135, 
    0.0344827586206897, 0, NA), `Median Advertised Rent - AMH Phoenix` = c(1561.25, 
    NA, 1595, NA, 1636.25, 1595, 1650, 1650, NA), `Median of Daily Rental Listings Available - AMH Phoenix` = c(130, 
    NA, 127, NA, 129, 131, 97, 85, NA), `M-o-M Change in Median Rent - AMH Raleigh` = c(0, 
    NA, 0.0290322580645161, NA, 0, 0, 0, NA, 0.0344827586206897
    ), `Median Advertised Rent - AMH Raleigh` = c(1550, NA, 1595, 
    NA, 1595, 1595, 1595, NA, 1650), `Median of Daily Rental Listings Available - AMH Raleigh` = c(91, 
    NA, 104, NA, 114, 142, 90, NA, 81), `M-o-M Change in Median Rent - AMH SoFla` = c(0, 
    -0.00869565217391299, NA, NA, 0.0233918128654971, -0.0314285714285715, 
    0.0162241887905605, NA, 0.0159651669085632), `Median Advertised Rent - AMH SoFla` = c(1725, 
    1710, NA, NA, 1750, 1695, 1722.5, NA, 1750), `Median of Daily Rental Listings Available - AMH SoFla` = c(11, 
    14, NA, NA, 11, 10, 7, NA, 7), `M-o-M Change in Median Rent - AMH Winston-Salem/Greensboro` = c(0, 
    NA, 0.0290322580645161, NA, 0, 0, 0, NA, 0), `Median Advertised Rent - AMH Winston-Salem/Greensboro` = c(1550, 
    NA, 1595, NA, 1595, 1595, 1595, NA, 1595), `Median of Daily Rental Listings Available - AMH Winston-Salem/Greensboro` = c(66, 
    NA, 68, NA, 72, 73, 71, NA, 63)), class = "data.frame", row.names = c("2020-01-31", 
"2020-02-28", "2020-02-29", "2020-03-30", "2020-03-31", "2020-04-30", 
"2020-05-31", "2020-06-14", "2020-06-15"))

以下是我尝试过的两种方法的示例:

test <- AMH_final_Monthly3 %>% mutate(month= month(Date), year=year(Date))
test2 <- AMH_final_Monthly3 %>%
   collapse_by("monthly") %>%
   dplyr::group_by(Date, add = TRUE)

test3 <- as.yearmon(AMH_final_Monthly3)

感谢帮助!

像下面这样的东西?我的设置是荷兰语,因此有奇怪的月份名称。但是我使用 tsibble 中的 yearmon 函数,通过 yearmonth 变量摆脱日期列组,并使用 sum 汇总所有变量。现在您还可以使用 firstlast 或您想要的任何其他函数。

AMH_final_Monthly3 %>% 
  mutate(yearmonth = tsibble::yearmonth(Date)) %>% 
  select(-Date) %>% 
  group_by(yearmonth) %>% 
  summarise_all(.funs = "sum", na.rm = TRUE)

# A tibble: 6 x 37
  yearmonth `M-o-M Change i~ `Median Adverti~ `Median of Dail~ `M-o-M Change i~ `Median Adverti~ `Median of Dail~ `M-o-M Change i~
      <mth>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>            <dbl>
1  2020 jan                0             1695             438            0                  1650              244           0     
2  2020 feb                0             1695             430.           0                  1650              257           0     
3  2020 mrt                0             1695             450            0                  1650              256           0     
4  2020 apr                0             1695             458            0.0273             1695              270           0     
5  2020 mei                0             1695             385            0                  1695              227           0.0306
6  2020 jun                0             1695             331            0.0324             1750              220           0     
# ... with 29 more variables: