如何最好地将季度数据转化为月度数据

How best to transform quarterly data into monthly

下面是示例数据。我以这样的形式接收数据。每行是一个季度,然后月份是其中的列。尝试逐月进行计算,但我认为我要转换数据框才能做到这一点。我在想我会做一个 pivot_longer 但没有在网上看到任何类似的东西。下面是想要的结果

 year<-c(2018,2018,2018,2018,2019,2019,2019,2019,2020,2020,2020,2020)
 qtr<-c(1,2,3,4,1,2,3,4,1,2,3,4)
 avgemp <-c(3,5,7,9,11,13,15,17,19,21,23,25)
 month1emp<-c(2,4,6,8,10,12,14,16,18,20,22,24)
 month2emp<-c(3,5,7,9,11,13,15,17,19,21,23,25)
 month3emp<-c(4,6,8,10,12,14,16,18,20,22,24,26)

 sample<-data.frame(year,qtr,month1emp,month2emp,month3emp)

 
 Desired Result 

 year   qtr   month   employment
2018    1       1           2
2018    1       2           3
2018    1       3           4
2018    2       4           4
2018    2       4           5
2018    2       4           6

等等。在 2019 年,月份值将重新开始并从 1 变为 12。

我们可以在 'month' 列上使用 pivot_longer,指定 names_pattern 来捕获数字 ((\d+)),然后是 emp 'month' 和 .value

library(dplyr)
library(tidyr)
sample %>% 
    pivot_longer(cols = starts_with('month'), 
     names_to = c("month", ".value"), names_pattern  = ".*(\d+)(emp)")%>% 
    rename(employment = emp)

-输出

# A tibble: 36 x 4
    year   qtr month employment
   <dbl> <dbl> <chr>      <dbl>
 1  2018     1 1              2
 2  2018     1 2              3
 3  2018     1 3              4
 4  2018     2 1              4
 5  2018     2 2              5
 6  2018     2 3              6
 7  2018     3 1              6
 8  2018     3 2              7
 9  2018     3 3              8
10  2018     4 1              8
# … with 26 more rows

如果我们需要根据 'qtr' 值增加 'month'

sample %>% 
     pivot_longer(cols = starts_with('month'), 
      names_to = c("month", ".value"), names_pattern  = ".*(\d+)(emp)")%>% 
     rename(employment = emp) %>% 
     mutate(month = as.integer(month) + c(0, 3, 6, 9)[qtr])
# A tibble: 36 x 4
    year   qtr month employment
   <dbl> <dbl> <dbl>      <dbl>
 1  2018     1     1          2
 2  2018     1     2          3
 3  2018     1     3          4
 4  2018     2     4          4
 5  2018     2     5          5
 6  2018     2     6          6
 7  2018     3     7          6
 8  2018     3     8          7
 9  2018     3     9          8
10  2018     4    10          8
# … with 26 more rows

基础 R 解决方案:

# Create a vector of boolean values,
# denoting whether or not the columns should
# be unpivoted: unpivot_cols => boolean vector
unpivot_cols <- startsWith(
  names(df), 
  "month"
)

# Reshape the data.frame, calculate 
# the month value: rshpd_df => data.frame
rshpd_df <- transform(
  reshape(
    df,
    direction = "long",
    varying = names(df)[unpivot_cols],
    ids = NULL,
    timevar = "month",
    times = seq_len(sum(unpivot_cols)),
    v.names = "employment",
    new.row.names = seq_len(
      nrow(df) * ncol(df)
    )
  ),
  month = ((12 / 4) * (qtr - 1)) + month
)

# Order the data.frame by year and month: 
# ordered_df => data.frame
ordered_df <- with(
  rshpd_df, 
  rshpd_df[order(year, month),]  
)