sqldf中的R调用变量

R call variable inside sqldf

我需要对 sqldf 语句进行循环,为此我需要在 sqldf 代码中调用循环变量:

我的 table "data",可能是:

data <- read.table(text ="
    loaddate DaysRange DaysRangeNext
1 2014-03-16        30            30
2 2014-03-16         0             0
3 2014-03-16         0             0
4 2014-03-16        60            NA
5 2014-04-16        30            30
6 2014-04-16         0            30
"
,header = TRUE)

然后我将 loaddate 格式化为日期:

data$loaddate<-as.Date(as.character(data$loaddate), format='%Y-%m-%d')

假设我有一个向量 "loaddates":

loaddates<- unique(sort(data$loaddate))

我需要为每个加载日期运行以下代码:

for (i in loaddates) {

sqldf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext, 
COUNT(*) AS clientes
FROM data AS D
WHERE D.loaddate = i
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
")        }

但是我得到以下错误:

Error in sqliteSendQuery(con, statement, bind.data) : error in statement: no such column: i

有没有办法保留变量值并在循环中使用它?

谢谢。

版本:

我试过了:

sqldf(
strwrap(sprintf("
SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext, 
COUNT(*) AS clientes 
FROM data AS D
WHERE D.LoadDate = '%s'
GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext
",i),simplify=TRUE,width=1000000))

但是我得到了:

> [1] loaddate      DaysRange     DaysRangeNext clientes      <0 rows>
> (or 0-length row.names)

变量 i 不会在查询中原样替换。您需要 sprintf 为其赋值。 (我也不知道您是否需要考虑换行符,只是为了确保我在下面提供了它。也许您不需要 sqldf;在这种情况下只需删除 strwrap)。

#let's assume loaddates is the following:
loaddates <- 'something'

一种根据需要获取查询的方法,即没有断行并且i采用您需要的加载日期值:

strwrap(sprintf("
                SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext, 
                COUNT(*) AS clientes
                FROM deuda AS D
                WHERE D.loaddate = '%s'
                GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
                ORDER BY D.DaysRange, D.DaysRangeNext
                ",i),simplify=TRUE,width=1000000)

这将输出:

[1] "SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext, COUNT(*) AS clientes FROM deuda AS D WHERE D.CodEmp = 'TGG' and D.loaddate = something GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext ORDER BY D.DaysRange, D.DaysRangeNext"

这是您需要的一行,没有换行符或变量 i 未分配。

在你的循环中它应该是:

for (i in loaddates) {

strwrap(sprintf("
                SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext, 
                COUNT(*) AS clientes
                FROM deuda AS D
                WHERE D.loaddate = '%s'
                GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
                ORDER BY D.DaysRange, D.DaysRangeNext
                ",i),simplify=TRUE,width=1000000)

}

使用您的数据集:

library(sqldf)
data <- read.table(text ="
    loaddate DaysRange DaysRangeNext
1 2014-03-16        30            30
2 2014-03-16         0             0
3 2014-03-16         0             0
4 2014-03-16        60            NA
5 2014-04-16        30            30
6 2014-04-16         0            30
"
                   ,header = TRUE,stringsAsFactors=F)

loaddates<- unique(sort(data$loaddate))

for (i in loaddates) {

  print(sqldf(
  strwrap(sprintf("
                SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext, 
                COUNT(*) AS clientes
                FROM data AS D
                WHERE D.loaddate = '%s'
                GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
                ORDER BY D.DaysRange, D.DaysRangeNext
                ",i),simplify=TRUE,width=1000000) ))
}

输出:

    loaddate DaysRange DaysRangeNext clientes
1 2014-03-16         0             0        2
2 2014-03-16        30            30        1
3 2014-03-16        60            NA        1
    loaddate DaysRange DaysRangeNext clientes
1 2014-04-16         0            30        1
2 2014-04-16        30            30        1

您可以通过在循环内但在函数调用之外定义 SQL 语句来实现此功能。

for (i in loaddates) {

statement = paste( " SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext,
              COUNT(*) AS clientes
              FROM data AS D
              WHERE D.loaddate = ", i,
"GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
ORDER BY D.DaysRange, D.DaysRangeNext " )

sqldf(statement)
}

首先创建一个新的数据框。然后加入它:

num_Pcode <- as.numeric("3550")
df_Pcode_0 <- as.data.frame(num_Pcode)
df_Pcode_0
...

returns num_Pcode.

fn$sqldf 允许在 sql 语句中使用 $ 来插入 R 变量。请参阅 sqldf github 主页上的示例 5,并在帮助页面 ?fn 的底部查看更多示例。如果我们不需要输出名称,我们可以将 setNames(loaddates, loaddates) 减少到 loaddates

Map(function(i)
  fn$sqldf("
    SELECT D.LoadDate,D.DaysRange, D.DaysRangeNext, 
    COUNT(*) AS clientes
    FROM data AS D
    WHERE D.loaddate = $i
    GROUP BY D.LoadDate,D.DaysRange, D.DaysRangeNext
    ORDER BY D.DaysRange, D.DaysRangeNext
  "), setNames(loaddates, loaddates))

给予:

$`2014-03-16`
    loaddate DaysRange DaysRangeNext clientes
1 2014-03-16         0             0        2
2 2014-03-16        30            30        1
3 2014-03-16        60            NA        1

$`2014-04-16`
    loaddate DaysRange DaysRangeNext clientes
1 2014-04-16         0            30        1
2 2014-04-16        30            30        1