具有日期差异的 sqldf 查询
sqldf query with date difference
我正在尝试对以下两个数据库(摘录)的颜色和日期差异执行(简化!)查询:
A B
A.COL A.TIME B.COL B.TIME
1 blue 2009-01-31 1 blue 2007-01-31
2 blue 2009-02-28 2 blue 2008-12-31
3 blue 2009-03-31 3 blue 2009-02-28
4 blue 2009-04-30 4 blue 2009-04-30
5 blue 2009-05-31 5 blue 2009-06-30
6 blue 2009-06-30 6 blue 2016-08-31
7 blue 2016-03-31
8 blue 2016-04-30
9 red ...
10 red ...
我想做的是:根据COL和TIME的差异合并表,即两次之间的差异不能大于或小于2个月(或者换句话说介于-2之间和 +2,取决于从哪个日期开始)。
# For example starting with observation 1 from A, that would imply 2 matches:
2009-01-31 matched to 2008-12-31 (diff = 1)
2009-01-31 matched to 2009-02-28 (diff = -1)
# for obs 2 from A, that would imply
2009-02-28 matched to 2008-12-31 (diff = 2)
2009-02-28 matched to 2009-02-28 (diff = 0)
2009-02-28 matched to 2009-04-30 (diff = -2)
等
我在考虑某种日期差异函数,要么来自 lubridate
,这在少于 30 天的月份的情况下是有问题的,有时会产生 NA,或者 as.yearmon
来自 zoo
,至少可以正确计算差异。但是,我无法将其正确实现到 sqldf
(错误:语句错误:"as" 附近:语法错误)。原因似乎是不能将每个 R 函数与 sqldf 一起使用。
任何想法如何在 R 中完成?我也在寻找一种优雅的方法来相互减去月份。 lubridate 存在这个问题:
Add/subtract 6 months (bond time) in R using lubridate, but here was one proposed way how to accomplish it with zoo
: Get the difference between dates in terms of weeks, months, quarters, and years
获取数据(感谢下方的@bouncyball 提供代码):
A <- read.table(
text = "
A.COL A.TIME
blue 2009-01-31
blue 2009-02-28
blue 2009-03-31
blue 2009-04-30
blue 2009-05-31
blue 2009-06-30
blue 2016-03-31
blue 2016-04-30
", header = T, stringsAsFactors = FALSE)
B <- read.table(
text = "
B.COL B.TIME
blue 2007-01-31
blue 2008-12-31
blue 2009-02-28
blue 2009-04-30
blue 2009-06-30
blue 2016-08-31
", stringsAsFactors = FALSE, header = T)
这是一个使用 this SO post 和 plyr
包中的函数的解决方案:
library(plyr)
# turn a date into a 'monthnumber' relative to an origin
monnb <- function(d) {
lt <- as.POSIXlt(as.Date(d, origin="1900-01-01"))
lt$year*12 + lt$mon
}
# compute a month difference as a difference between two monnb's
mondf <- function(d1, d2) { monnb(d2) - monnb(d1) }
# iterate over rows of A looking for matches in B
adply(A, 1, function(x)
B[x$A.COL == B$B.COL &
abs(mondf(as.Date(x$A.TIME), as.Date(B$B.TIME))) <= 2,]
)
# A.COL A.TIME B.COL B.TIME
# 1 blue 2009-01-31 blue 2008-12-31
# 2 blue 2009-01-31 blue 2009-02-28
# 3 blue 2009-02-28 blue 2008-12-31
# 4 blue 2009-02-28 blue 2009-02-28
# 5 blue 2009-02-28 blue 2009-04-30
# ....
编辑:data.table
实施
library(data.table)
merge_AB <- data.table(merge(A,B, by.x = 'A.COL', by.y = 'B.COL'))
merge_AB[,DateDiff := abs(mondf(A.TIME, B.TIME))
][DateDiff <= 2]
# A.COL A.TIME B.TIME DateDiff
# 1: blue 2009-01-31 2008-12-31 1
# 2: blue 2009-01-31 2009-02-28 1
# 3: blue 2009-02-28 2008-12-31 2
# 4: blue 2009-02-28 2009-02-28 0
# 5: blue 2009-02-28 2009-04-30 2
# ...
数据
A <- read.table(
text = "
A.COL A.TIME
blue 2009-01-31
blue 2009-02-28
blue 2009-03-31
blue 2009-04-30
blue 2009-05-31
blue 2009-06-30
blue 2016-03-31
blue 2016-04-30
", header = T, stringsAsFactors = FALSE)
B <- read.table(
text = "
B.COL B.TIME
blue 2007-01-31
blue 2008-12-31
blue 2009-02-28
blue 2009-04-30
blue 2009-06-30
blue 2016-08-31
", stringsAsFactors = FALSE, header = T)
我正在尝试对以下两个数据库(摘录)的颜色和日期差异执行(简化!)查询:
A B
A.COL A.TIME B.COL B.TIME
1 blue 2009-01-31 1 blue 2007-01-31
2 blue 2009-02-28 2 blue 2008-12-31
3 blue 2009-03-31 3 blue 2009-02-28
4 blue 2009-04-30 4 blue 2009-04-30
5 blue 2009-05-31 5 blue 2009-06-30
6 blue 2009-06-30 6 blue 2016-08-31
7 blue 2016-03-31
8 blue 2016-04-30
9 red ...
10 red ...
我想做的是:根据COL和TIME的差异合并表,即两次之间的差异不能大于或小于2个月(或者换句话说介于-2之间和 +2,取决于从哪个日期开始)。
# For example starting with observation 1 from A, that would imply 2 matches:
2009-01-31 matched to 2008-12-31 (diff = 1)
2009-01-31 matched to 2009-02-28 (diff = -1)
# for obs 2 from A, that would imply
2009-02-28 matched to 2008-12-31 (diff = 2)
2009-02-28 matched to 2009-02-28 (diff = 0)
2009-02-28 matched to 2009-04-30 (diff = -2)
等
我在考虑某种日期差异函数,要么来自 lubridate
,这在少于 30 天的月份的情况下是有问题的,有时会产生 NA,或者 as.yearmon
来自 zoo
,至少可以正确计算差异。但是,我无法将其正确实现到 sqldf
(错误:语句错误:"as" 附近:语法错误)。原因似乎是不能将每个 R 函数与 sqldf 一起使用。
任何想法如何在 R 中完成?我也在寻找一种优雅的方法来相互减去月份。 lubridate 存在这个问题:
Add/subtract 6 months (bond time) in R using lubridate, but here was one proposed way how to accomplish it with zoo
: Get the difference between dates in terms of weeks, months, quarters, and years
获取数据(感谢下方的@bouncyball 提供代码):
A <- read.table(
text = "
A.COL A.TIME
blue 2009-01-31
blue 2009-02-28
blue 2009-03-31
blue 2009-04-30
blue 2009-05-31
blue 2009-06-30
blue 2016-03-31
blue 2016-04-30
", header = T, stringsAsFactors = FALSE)
B <- read.table(
text = "
B.COL B.TIME
blue 2007-01-31
blue 2008-12-31
blue 2009-02-28
blue 2009-04-30
blue 2009-06-30
blue 2016-08-31
", stringsAsFactors = FALSE, header = T)
这是一个使用 this SO post 和 plyr
包中的函数的解决方案:
library(plyr)
# turn a date into a 'monthnumber' relative to an origin
monnb <- function(d) {
lt <- as.POSIXlt(as.Date(d, origin="1900-01-01"))
lt$year*12 + lt$mon
}
# compute a month difference as a difference between two monnb's
mondf <- function(d1, d2) { monnb(d2) - monnb(d1) }
# iterate over rows of A looking for matches in B
adply(A, 1, function(x)
B[x$A.COL == B$B.COL &
abs(mondf(as.Date(x$A.TIME), as.Date(B$B.TIME))) <= 2,]
)
# A.COL A.TIME B.COL B.TIME
# 1 blue 2009-01-31 blue 2008-12-31
# 2 blue 2009-01-31 blue 2009-02-28
# 3 blue 2009-02-28 blue 2008-12-31
# 4 blue 2009-02-28 blue 2009-02-28
# 5 blue 2009-02-28 blue 2009-04-30
# ....
编辑:data.table
实施
library(data.table)
merge_AB <- data.table(merge(A,B, by.x = 'A.COL', by.y = 'B.COL'))
merge_AB[,DateDiff := abs(mondf(A.TIME, B.TIME))
][DateDiff <= 2]
# A.COL A.TIME B.TIME DateDiff
# 1: blue 2009-01-31 2008-12-31 1
# 2: blue 2009-01-31 2009-02-28 1
# 3: blue 2009-02-28 2008-12-31 2
# 4: blue 2009-02-28 2009-02-28 0
# 5: blue 2009-02-28 2009-04-30 2
# ...
数据
A <- read.table(
text = "
A.COL A.TIME
blue 2009-01-31
blue 2009-02-28
blue 2009-03-31
blue 2009-04-30
blue 2009-05-31
blue 2009-06-30
blue 2016-03-31
blue 2016-04-30
", header = T, stringsAsFactors = FALSE)
B <- read.table(
text = "
B.COL B.TIME
blue 2007-01-31
blue 2008-12-31
blue 2009-02-28
blue 2009-04-30
blue 2009-06-30
blue 2016-08-31
", stringsAsFactors = FALSE, header = T)