sqldf中的R调用变量
R call variable inside sqldf
我需要对 sqldf 语句进行循环,为此我需要在 sqldf 代码中调用循环变量:
我的 table "data",可能是:
data <- read.table(text ="
loaddate DaysRange DaysRangeNext
1 2014-03-16 30 30
2 2014-03-16 0 0
3 2014-03-16 0 0
4 2014-03-16 60 NA
5 2014-04-16 30 30
6 2014-04-16 0 30
"
,header = TRUE)
然后我将 loaddate 格式化为日期:
data$loaddate<-as.Date(as.character(data$loaddate), format='%Y-%m-%d')
假设我有一个向量 "loaddates":
loaddates<- unique(sort(data$loaddate))
我需要为每个加载日期运行以下代码:
for (i in loaddates) {
sqldf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM data AS D
WHERE D.loaddate = i
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
") }
但是我得到以下错误:
Error in sqliteSendQuery(con, statement, bind.data) : error in
statement: no such column: i
有没有办法保留变量值并在循环中使用它?
谢谢。
版本:
我试过了:
sqldf(
strwrap(sprintf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM data AS D
WHERE D.LoadDate = '%s'
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
",i),simplify=TRUE,width=1000000))
但是我得到了:
> [1] loaddate DaysRange DaysRangeNext clientes <0 rows>
> (or 0-length row.names)
变量 i
不会在查询中原样替换。您需要 sprintf
为其赋值。 (我也不知道您是否需要考虑换行符,只是为了确保我在下面提供了它。也许您不需要 sqldf;在这种情况下只需删除 strwrap)。
#let's assume loaddates is the following:
loaddates <- 'something'
一种根据需要获取查询的方法,即没有断行并且i
采用您需要的加载日期值:
strwrap(sprintf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM deuda AS D
WHERE D.loaddate = '%s'
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
",i),simplify=TRUE,width=1000000)
这将输出:
[1] "SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext, COUNT(*) AS clientes FROM deuda AS D WHERE D.CodEmp = 'TGG' and D.loaddate = something GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext ORDER BY D.DaysRange, D.DaysRangeNext"
这是您需要的一行,没有换行符或变量 i
未分配。
在你的循环中它应该是:
for (i in loaddates) {
strwrap(sprintf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM deuda AS D
WHERE D.loaddate = '%s'
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
",i),simplify=TRUE,width=1000000)
}
使用您的数据集:
library(sqldf)
data <- read.table(text ="
loaddate DaysRange DaysRangeNext
1 2014-03-16 30 30
2 2014-03-16 0 0
3 2014-03-16 0 0
4 2014-03-16 60 NA
5 2014-04-16 30 30
6 2014-04-16 0 30
"
,header = TRUE,stringsAsFactors=F)
loaddates<- unique(sort(data$loaddate))
for (i in loaddates) {
print(sqldf(
strwrap(sprintf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM data AS D
WHERE D.loaddate = '%s'
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
",i),simplify=TRUE,width=1000000) ))
}
输出:
loaddate DaysRange DaysRangeNext clientes
1 2014-03-16 0 0 2
2 2014-03-16 30 30 1
3 2014-03-16 60 NA 1
loaddate DaysRange DaysRangeNext clientes
1 2014-04-16 0 30 1
2 2014-04-16 30 30 1
您可以通过在循环内但在函数调用之外定义 SQL 语句来实现此功能。
for (i in loaddates) {
statement = paste( " SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM data AS D
WHERE D.loaddate = ", i,
"GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext " )
sqldf(statement)
}
首先创建一个新的数据框。然后加入它:
num_Pcode <- as.numeric("3550")
df_Pcode_0 <- as.data.frame(num_Pcode)
df_Pcode_0
...
returns num_Pcode
.
fn$sqldf
允许在 sql 语句中使用 $
来插入 R 变量。请参阅 sqldf github 主页上的示例 5,并在帮助页面 ?fn
的底部查看更多示例。如果我们不需要输出名称,我们可以将 setNames(loaddates, loaddates)
减少到 loaddates
。
Map(function(i)
fn$sqldf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM data AS D
WHERE D.loaddate = $i
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
"), setNames(loaddates, loaddates))
给予:
$`2014-03-16`
loaddate DaysRange DaysRangeNext clientes
1 2014-03-16 0 0 2
2 2014-03-16 30 30 1
3 2014-03-16 60 NA 1
$`2014-04-16`
loaddate DaysRange DaysRangeNext clientes
1 2014-04-16 0 30 1
2 2014-04-16 30 30 1
我需要对 sqldf 语句进行循环,为此我需要在 sqldf 代码中调用循环变量:
我的 table "data",可能是:
data <- read.table(text ="
loaddate DaysRange DaysRangeNext
1 2014-03-16 30 30
2 2014-03-16 0 0
3 2014-03-16 0 0
4 2014-03-16 60 NA
5 2014-04-16 30 30
6 2014-04-16 0 30
"
,header = TRUE)
然后我将 loaddate 格式化为日期:
data$loaddate<-as.Date(as.character(data$loaddate), format='%Y-%m-%d')
假设我有一个向量 "loaddates":
loaddates<- unique(sort(data$loaddate))
我需要为每个加载日期运行以下代码:
for (i in loaddates) {
sqldf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM data AS D
WHERE D.loaddate = i
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
") }
但是我得到以下错误:
Error in sqliteSendQuery(con, statement, bind.data) : error in statement: no such column: i
有没有办法保留变量值并在循环中使用它?
谢谢。
版本:
我试过了:
sqldf(
strwrap(sprintf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM data AS D
WHERE D.LoadDate = '%s'
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
",i),simplify=TRUE,width=1000000))
但是我得到了:
> [1] loaddate DaysRange DaysRangeNext clientes <0 rows> > (or 0-length row.names)
变量 i
不会在查询中原样替换。您需要 sprintf
为其赋值。 (我也不知道您是否需要考虑换行符,只是为了确保我在下面提供了它。也许您不需要 sqldf;在这种情况下只需删除 strwrap)。
#let's assume loaddates is the following:
loaddates <- 'something'
一种根据需要获取查询的方法,即没有断行并且i
采用您需要的加载日期值:
strwrap(sprintf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM deuda AS D
WHERE D.loaddate = '%s'
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
",i),simplify=TRUE,width=1000000)
这将输出:
[1] "SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext, COUNT(*) AS clientes FROM deuda AS D WHERE D.CodEmp = 'TGG' and D.loaddate = something GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext ORDER BY D.DaysRange, D.DaysRangeNext"
这是您需要的一行,没有换行符或变量 i
未分配。
在你的循环中它应该是:
for (i in loaddates) {
strwrap(sprintf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM deuda AS D
WHERE D.loaddate = '%s'
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
",i),simplify=TRUE,width=1000000)
}
使用您的数据集:
library(sqldf)
data <- read.table(text ="
loaddate DaysRange DaysRangeNext
1 2014-03-16 30 30
2 2014-03-16 0 0
3 2014-03-16 0 0
4 2014-03-16 60 NA
5 2014-04-16 30 30
6 2014-04-16 0 30
"
,header = TRUE,stringsAsFactors=F)
loaddates<- unique(sort(data$loaddate))
for (i in loaddates) {
print(sqldf(
strwrap(sprintf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM data AS D
WHERE D.loaddate = '%s'
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
",i),simplify=TRUE,width=1000000) ))
}
输出:
loaddate DaysRange DaysRangeNext clientes
1 2014-03-16 0 0 2
2 2014-03-16 30 30 1
3 2014-03-16 60 NA 1
loaddate DaysRange DaysRangeNext clientes
1 2014-04-16 0 30 1
2 2014-04-16 30 30 1
您可以通过在循环内但在函数调用之外定义 SQL 语句来实现此功能。
for (i in loaddates) {
statement = paste( " SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM data AS D
WHERE D.loaddate = ", i,
"GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext " )
sqldf(statement)
}
首先创建一个新的数据框。然后加入它:
num_Pcode <- as.numeric("3550")
df_Pcode_0 <- as.data.frame(num_Pcode)
df_Pcode_0
...
returns num_Pcode
.
fn$sqldf
允许在 sql 语句中使用 $
来插入 R 变量。请参阅 sqldf github 主页上的示例 5,并在帮助页面 ?fn
的底部查看更多示例。如果我们不需要输出名称,我们可以将 setNames(loaddates, loaddates)
减少到 loaddates
。
Map(function(i)
fn$sqldf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
COUNT(*) AS clientes
FROM data AS D
WHERE D.loaddate = $i
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
"), setNames(loaddates, loaddates))
给予:
$`2014-03-16`
loaddate DaysRange DaysRangeNext clientes
1 2014-03-16 0 0 2
2 2014-03-16 30 30 1
3 2014-03-16 60 NA 1
$`2014-04-16`
loaddate DaysRange DaysRangeNext clientes
1 2014-04-16 0 30 1
2 2014-04-16 30 30 1