按月扩展数据
Expanding data by month
我有以下包含 2 列的数据框:地址、开始日期、纬度和经度。它是给定地址被清理的月份的列表。
df = data.frame(address = c("1 ex St", "2 ex St"),
year = (c(2011,2011)),
month = c("February","April"),
latitude = c(341.32,343.3),
longitude =c(432.3, 343.6))
所以数据看起来像这样
address year month latitude longitude
1 ex St 2011 February 341.32 432.3
2 ex St 2011 April 343.30 343.6
现在每一行代表一个特定的地址和该地址被清理的特定月份。我想 'expand' 数据,以便 2011 年每个月地址列中的每个条目都分为 12 行。我还想添加一个虚拟变量,指示该地块之前是否已清理过。所以数据应该是这样的:
address year month latitude longitude cleaned
1 ex St 2011 January 341.32 432.3 0
1 ex St 2011 February 341.32 432.3 1
1 ex St 2011 March 341.32 432.3 1
1 ex St 2011 April 341.32 432.3 1
1 ex St 2011 May 341.32 432.3 1
1 ex St 2011 June 341.32 432.3 1
1 ex St 2011 July 341.32 432.3 1
1 ex St 2011 August 341.32 432.3 1
1 ex St 2011 Septmber 341.32 432.3 1
1 ex St 2011 October 341.32 432.3 1
1 ex St 2011 November 341.32 432.3 1
1 ex St 2011 December 341.32 432.3 1
2 ex St 2011 January 343.30 343.6 0
2 ex St 2011 February 343.30 343.6 0
2 ex St 2011 March 343.30 343.6 0
2 ex St 2011 April 343.30 343.6 1
2 ex St 2011 May 343.30 343.6 1
2 ex St 2011 June 343.30 343.6 1
2 ex St 2011 July 343.30 343.6 1
2 ex St 2011 August 343.30 343.6 1
2 ex St 2011 Septmber 343.30 343.6 1
2 ex St 2011 October 343.30 343.6 1
2 ex St 2011 November 343.30 343.6 1
2 ex St 2011 December 343.30 343.6 1
是否有包或函数可以让我以这种方式按月扩展我的数据?我看过 melt 和 reshape 包,但它们似乎不适用于我的情况。我不一定要寻找答案,只是寻找一些关于使用什么工具的指导!
编辑:我使用了以下答案,但清理后的列仍然存在。这是输出。
month address year latitude longitude cleaned
1 January 1 ex St 2011 341.32 432.3 0
2 February 1 ex St 2011 341.32 432.3 1
3 March 1 ex St 2011 341.32 432.3 0
4 April 1 ex St 2011 341.32 432.3 1
5 May 1 ex St 2011 341.32 432.3 0
6 June 1 ex St 2011 341.32 432.3 0
7 July 1 ex St 2011 341.32 432.3 0
8 August 1 ex St 2011 341.32 432.3 0
9 September 1 ex St 2011 341.32 432.3 1
10 October 1 ex St 2011 341.32 432.3 1
11 November 1 ex St 2011 341.32 432.3 0
12 December 1 ex St 2011 341.32 432.3 1
13 January 2 ex St 2011 343.3 343.6 1
14 February 2 ex St 2011 343.3 343.6 1
15 March 2 ex St 2011 343.3 343.6 0
16 April 2 ex St 2011 343.3 343.6 0
17 May 2 ex St 2011 343.3 343.6 1
18 June 2 ex St 2011 343.3 343.6 0
19 July 2 ex St 2011 343.3 343.6 1
20 August 2 ex St 2011 343.3 343.6 0
21 September 2 ex St 2011 343.3 343.6 0
22 October 2 ex St 2011 343.3 343.6 1
23 November 2 ex St 2011 343.3 343.6 1
24 December 2 ex St 2011 343.3 343.6 0
我怀疑 na.locf() 函数不起作用,因为清理后的列是从 0 到 1 采样的,并且其中没有任何 NA 可以更改。所以现在清理过的列只是 0 和 1 的随机样本。是否有另一个 function/strategy 我可以用来获取 1 和 0 以对应清理地址之前和之后的地址?
按地址拆分,与所有月份合并,创建虚拟清理列。然后用现有值填充 NA。最后按地址和月份名称排序:
library(zoo) # na.locf to fill NAs
do.call(rbind,
lapply(split(df, df$address), function(i) {
d <- merge(i, data.frame(month = month.name), all.y = TRUE)
# convert to factor, then order by month, so it Jan, Feb, Mar, etc
d$month <- factor(d$month, levels = month.name)
d <- d[ order(d$month), ]
# NA fill down
d <- na.locf(d)
# Make cleaned column
d$clened <- ifelse(is.na(d$address), 0, 1)
# NA fill up
d <- na.locf(d, fromLast = TRUE)
}))
# month address year latitude longitude clened
# 1 ex St.5 January 1 ex St 2011 341.32 432.3 0
# 1 ex St.2 February 1 ex St 2011 341.32 432.3 1
# 1 ex St.8 March 1 ex St 2011 341.32 432.3 1
# 1 ex St.1 April 1 ex St 2011 341.32 432.3 1
# 1 ex St.9 May 1 ex St 2011 341.32 432.3 1
# 1 ex St.7 June 1 ex St 2011 341.32 432.3 1
# 1 ex St.6 July 1 ex St 2011 341.32 432.3 1
# 1 ex St.3 August 1 ex St 2011 341.32 432.3 1
# 1 ex St.12 September 1 ex St 2011 341.32 432.3 1
# 1 ex St.11 October 1 ex St 2011 341.32 432.3 1
# 1 ex St.10 November 1 ex St 2011 341.32 432.3 1
# 1 ex St.4 December 1 ex St 2011 341.32 432.3 1
# 2 ex St.5 January 2 ex St 2011 343.3 343.6 0
# 2 ex St.2 February 2 ex St 2011 343.3 343.6 0
# 2 ex St.8 March 2 ex St 2011 343.3 343.6 0
# 2 ex St.1 April 2 ex St 2011 343.3 343.6 1
# 2 ex St.9 May 2 ex St 2011 343.3 343.6 1
# 2 ex St.7 June 2 ex St 2011 343.3 343.6 1
# 2 ex St.6 July 2 ex St 2011 343.3 343.6 1
# 2 ex St.3 August 2 ex St 2011 343.3 343.6 1
# 2 ex St.12 September 2 ex St 2011 343.3 343.6 1
# 2 ex St.11 October 2 ex St 2011 343.3 343.6 1
# 2 ex St.10 November 2 ex St 2011 343.3 343.6 1
# 2 ex St.4 December 2 ex St 2011 343.3 343.6 1
我有以下包含 2 列的数据框:地址、开始日期、纬度和经度。它是给定地址被清理的月份的列表。
df = data.frame(address = c("1 ex St", "2 ex St"),
year = (c(2011,2011)),
month = c("February","April"),
latitude = c(341.32,343.3),
longitude =c(432.3, 343.6))
所以数据看起来像这样
address year month latitude longitude
1 ex St 2011 February 341.32 432.3
2 ex St 2011 April 343.30 343.6
现在每一行代表一个特定的地址和该地址被清理的特定月份。我想 'expand' 数据,以便 2011 年每个月地址列中的每个条目都分为 12 行。我还想添加一个虚拟变量,指示该地块之前是否已清理过。所以数据应该是这样的:
address year month latitude longitude cleaned
1 ex St 2011 January 341.32 432.3 0
1 ex St 2011 February 341.32 432.3 1
1 ex St 2011 March 341.32 432.3 1
1 ex St 2011 April 341.32 432.3 1
1 ex St 2011 May 341.32 432.3 1
1 ex St 2011 June 341.32 432.3 1
1 ex St 2011 July 341.32 432.3 1
1 ex St 2011 August 341.32 432.3 1
1 ex St 2011 Septmber 341.32 432.3 1
1 ex St 2011 October 341.32 432.3 1
1 ex St 2011 November 341.32 432.3 1
1 ex St 2011 December 341.32 432.3 1
2 ex St 2011 January 343.30 343.6 0
2 ex St 2011 February 343.30 343.6 0
2 ex St 2011 March 343.30 343.6 0
2 ex St 2011 April 343.30 343.6 1
2 ex St 2011 May 343.30 343.6 1
2 ex St 2011 June 343.30 343.6 1
2 ex St 2011 July 343.30 343.6 1
2 ex St 2011 August 343.30 343.6 1
2 ex St 2011 Septmber 343.30 343.6 1
2 ex St 2011 October 343.30 343.6 1
2 ex St 2011 November 343.30 343.6 1
2 ex St 2011 December 343.30 343.6 1
是否有包或函数可以让我以这种方式按月扩展我的数据?我看过 melt 和 reshape 包,但它们似乎不适用于我的情况。我不一定要寻找答案,只是寻找一些关于使用什么工具的指导!
编辑:我使用了以下答案,但清理后的列仍然存在。这是输出。
month address year latitude longitude cleaned
1 January 1 ex St 2011 341.32 432.3 0
2 February 1 ex St 2011 341.32 432.3 1
3 March 1 ex St 2011 341.32 432.3 0
4 April 1 ex St 2011 341.32 432.3 1
5 May 1 ex St 2011 341.32 432.3 0
6 June 1 ex St 2011 341.32 432.3 0
7 July 1 ex St 2011 341.32 432.3 0
8 August 1 ex St 2011 341.32 432.3 0
9 September 1 ex St 2011 341.32 432.3 1
10 October 1 ex St 2011 341.32 432.3 1
11 November 1 ex St 2011 341.32 432.3 0
12 December 1 ex St 2011 341.32 432.3 1
13 January 2 ex St 2011 343.3 343.6 1
14 February 2 ex St 2011 343.3 343.6 1
15 March 2 ex St 2011 343.3 343.6 0
16 April 2 ex St 2011 343.3 343.6 0
17 May 2 ex St 2011 343.3 343.6 1
18 June 2 ex St 2011 343.3 343.6 0
19 July 2 ex St 2011 343.3 343.6 1
20 August 2 ex St 2011 343.3 343.6 0
21 September 2 ex St 2011 343.3 343.6 0
22 October 2 ex St 2011 343.3 343.6 1
23 November 2 ex St 2011 343.3 343.6 1
24 December 2 ex St 2011 343.3 343.6 0
我怀疑 na.locf() 函数不起作用,因为清理后的列是从 0 到 1 采样的,并且其中没有任何 NA 可以更改。所以现在清理过的列只是 0 和 1 的随机样本。是否有另一个 function/strategy 我可以用来获取 1 和 0 以对应清理地址之前和之后的地址?
按地址拆分,与所有月份合并,创建虚拟清理列。然后用现有值填充 NA。最后按地址和月份名称排序:
library(zoo) # na.locf to fill NAs
do.call(rbind,
lapply(split(df, df$address), function(i) {
d <- merge(i, data.frame(month = month.name), all.y = TRUE)
# convert to factor, then order by month, so it Jan, Feb, Mar, etc
d$month <- factor(d$month, levels = month.name)
d <- d[ order(d$month), ]
# NA fill down
d <- na.locf(d)
# Make cleaned column
d$clened <- ifelse(is.na(d$address), 0, 1)
# NA fill up
d <- na.locf(d, fromLast = TRUE)
}))
# month address year latitude longitude clened
# 1 ex St.5 January 1 ex St 2011 341.32 432.3 0
# 1 ex St.2 February 1 ex St 2011 341.32 432.3 1
# 1 ex St.8 March 1 ex St 2011 341.32 432.3 1
# 1 ex St.1 April 1 ex St 2011 341.32 432.3 1
# 1 ex St.9 May 1 ex St 2011 341.32 432.3 1
# 1 ex St.7 June 1 ex St 2011 341.32 432.3 1
# 1 ex St.6 July 1 ex St 2011 341.32 432.3 1
# 1 ex St.3 August 1 ex St 2011 341.32 432.3 1
# 1 ex St.12 September 1 ex St 2011 341.32 432.3 1
# 1 ex St.11 October 1 ex St 2011 341.32 432.3 1
# 1 ex St.10 November 1 ex St 2011 341.32 432.3 1
# 1 ex St.4 December 1 ex St 2011 341.32 432.3 1
# 2 ex St.5 January 2 ex St 2011 343.3 343.6 0
# 2 ex St.2 February 2 ex St 2011 343.3 343.6 0
# 2 ex St.8 March 2 ex St 2011 343.3 343.6 0
# 2 ex St.1 April 2 ex St 2011 343.3 343.6 1
# 2 ex St.9 May 2 ex St 2011 343.3 343.6 1
# 2 ex St.7 June 2 ex St 2011 343.3 343.6 1
# 2 ex St.6 July 2 ex St 2011 343.3 343.6 1
# 2 ex St.3 August 2 ex St 2011 343.3 343.6 1
# 2 ex St.12 September 2 ex St 2011 343.3 343.6 1
# 2 ex St.11 October 2 ex St 2011 343.3 343.6 1
# 2 ex St.10 November 2 ex St 2011 343.3 343.6 1
# 2 ex St.4 December 2 ex St 2011 343.3 343.6 1