按月扩展数据

Expanding data by month

我有以下包含 2 列的数据框:地址、开始日期、纬度和经度。它是给定地址被清理的月份的列表。

df = data.frame(address = c("1 ex St", "2 ex St"), 
               year = (c(2011,2011)),
               month = c("February","April"),
               latitude = c(341.32,343.3),
               longitude =c(432.3, 343.6))

所以数据看起来像这样

  address   year   month    latitude   longitude
  1 ex St   2011   February 341.32     432.3
  2 ex St   2011   April    343.30     343.6

现在每一行代表一个特定的地址和该地址被清理的特定月份。我想 'expand' 数据,以便 2011 年每个月地址列中的每个条目都分为 12 行。我还想添加一个虚拟变量,指示该地块之前是否已清理过。所以数据应该是这样的:

  address   year   month    latitude   longitude cleaned
  1 ex St   2011   January  341.32     432.3     0
  1 ex St   2011   February 341.32     432.3     1
  1 ex St   2011   March    341.32     432.3     1
  1 ex St   2011   April    341.32     432.3     1
  1 ex St   2011   May      341.32     432.3     1
  1 ex St   2011   June     341.32     432.3     1
  1 ex St   2011   July     341.32     432.3     1
  1 ex St   2011   August   341.32     432.3     1
  1 ex St   2011   Septmber 341.32     432.3     1
  1 ex St   2011   October  341.32     432.3     1
  1 ex St   2011   November 341.32     432.3     1
  1 ex St   2011   December 341.32     432.3     1
  2 ex St   2011   January  343.30     343.6     0
  2 ex St   2011   February 343.30     343.6     0
  2 ex St   2011   March    343.30     343.6     0
  2 ex St   2011   April    343.30     343.6     1
  2 ex St   2011   May      343.30     343.6     1
  2 ex St   2011   June     343.30     343.6     1
  2 ex St   2011   July     343.30     343.6     1
  2 ex St   2011   August   343.30     343.6     1
  2 ex St   2011   Septmber 343.30     343.6     1
  2 ex St   2011   October  343.30     343.6     1
  2 ex St   2011   November 343.30     343.6     1
  2 ex St   2011   December 343.30     343.6     1

是否有包或函数可以让我以这种方式按月扩展我的数据?我看过 melt 和 reshape 包,但它们似乎不适用于我的情况。我不一定要寻找答案,只是寻找一些关于使用什么工具的指导!

编辑:我使用了以下答案,但清理后的列仍然存在。这是输出。

       month address year latitude longitude cleaned
1    January 1 ex St 2011   341.32     432.3       0
2   February 1 ex St 2011   341.32     432.3       1
3      March 1 ex St 2011   341.32     432.3       0
4      April 1 ex St 2011   341.32     432.3       1
5        May 1 ex St 2011   341.32     432.3       0
6       June 1 ex St 2011   341.32     432.3       0
7       July 1 ex St 2011   341.32     432.3       0
8     August 1 ex St 2011   341.32     432.3       0
9  September 1 ex St 2011   341.32     432.3       1
10   October 1 ex St 2011   341.32     432.3       1
11  November 1 ex St 2011   341.32     432.3       0
12  December 1 ex St 2011   341.32     432.3       1
13   January 2 ex St 2011    343.3     343.6       1
14  February 2 ex St 2011    343.3     343.6       1
15     March 2 ex St 2011    343.3     343.6       0
16     April 2 ex St 2011    343.3     343.6       0
17       May 2 ex St 2011    343.3     343.6       1
18      June 2 ex St 2011    343.3     343.6       0
19      July 2 ex St 2011    343.3     343.6       1
20    August 2 ex St 2011    343.3     343.6       0
21 September 2 ex St 2011    343.3     343.6       0
22   October 2 ex St 2011    343.3     343.6       1
23  November 2 ex St 2011    343.3     343.6       1
24  December 2 ex St 2011    343.3     343.6       0

我怀疑 na.locf() 函数不起作用,因为清理后的列是从 0 到 1 采样的,并且其中没有任何 NA 可以更改。所以现在清理过的列只是 0 和 1 的随机样本。是否有另一个 function/strategy 我可以用来获取 1 和 0 以对应清理地址之前和之后的地址?

按地址拆分,与所有月份合并,创建虚拟清理列。然后用现有值填充 NA。最后按地址和月份名称排序:

library(zoo) # na.locf to fill NAs

do.call(rbind,
        lapply(split(df, df$address), function(i) {
          d <- merge(i, data.frame(month = month.name), all.y = TRUE)
          # convert to factor, then order by month, so it Jan, Feb, Mar, etc
          d$month <- factor(d$month, levels = month.name)
          d <- d[ order(d$month), ]
          # NA fill down
          d <- na.locf(d)
          # Make cleaned column 
          d$clened <- ifelse(is.na(d$address), 0, 1)
          # NA fill up
          d <- na.locf(d, fromLast = TRUE)
        }))

#                month address year latitude longitude clened
# 1 ex St.5    January 1 ex St 2011   341.32     432.3      0
# 1 ex St.2   February 1 ex St 2011   341.32     432.3      1
# 1 ex St.8      March 1 ex St 2011   341.32     432.3      1
# 1 ex St.1      April 1 ex St 2011   341.32     432.3      1
# 1 ex St.9        May 1 ex St 2011   341.32     432.3      1
# 1 ex St.7       June 1 ex St 2011   341.32     432.3      1
# 1 ex St.6       July 1 ex St 2011   341.32     432.3      1
# 1 ex St.3     August 1 ex St 2011   341.32     432.3      1
# 1 ex St.12 September 1 ex St 2011   341.32     432.3      1
# 1 ex St.11   October 1 ex St 2011   341.32     432.3      1
# 1 ex St.10  November 1 ex St 2011   341.32     432.3      1
# 1 ex St.4   December 1 ex St 2011   341.32     432.3      1
# 2 ex St.5    January 2 ex St 2011    343.3     343.6      0
# 2 ex St.2   February 2 ex St 2011    343.3     343.6      0
# 2 ex St.8      March 2 ex St 2011    343.3     343.6      0
# 2 ex St.1      April 2 ex St 2011    343.3     343.6      1
# 2 ex St.9        May 2 ex St 2011    343.3     343.6      1
# 2 ex St.7       June 2 ex St 2011    343.3     343.6      1
# 2 ex St.6       July 2 ex St 2011    343.3     343.6      1
# 2 ex St.3     August 2 ex St 2011    343.3     343.6      1
# 2 ex St.12 September 2 ex St 2011    343.3     343.6      1
# 2 ex St.11   October 2 ex St 2011    343.3     343.6      1
# 2 ex St.10  November 2 ex St 2011    343.3     343.6      1
# 2 ex St.4   December 2 ex St 2011    343.3     343.6      1