SQL-查询丢失信息
SQL-query loses information
我正在使用 RODBC
在 R 中执行 SQL 查询。
查询给我的数据格式为
Date(POSIXct); var1:var29
%Y-%m-%d %H:%M:%S; numeric_values
问题是,一旦查询超过一定长度,6960 个左右的 obs 和 29-30 个变量,包括日期,传递给 R 的信息开始看起来像这样:
Date(POSIXct); var1:var30
%Y-%m-%d; numeric_values
因此,我丢失了 "%H:%M:%S"
信息。我不知道为什么。如果我减少变量的数量,我可以在这种情况发生之前增加时间长度。
ts-sql
在 windows-server 2007 上。(我相信)
SQL-R 中的调用示例:
sqlQuery(database, "SELECT [datetime], [0] as SYS, [1] as NO1, [2] as NO2, [7] as NO3, [9] as NO4, [19] as NO5, [5] as DK1,[6] as DK2, [25] as SE1,
[26] as SE2,[27] as SE3, [28] as SE4, [4] as FIN, [13] as DE, [14] as NL, [16] as FR, [15] as CH, [17] as AT, [20] as EE,
[36] as LT, [45] as LV, [42] as SI, [50] as IT, [44] as ES, [43] as BE, [74] as HU, [75] as CZ, [41] as UK
From
(
SELECT [area_id],[pris],[datetime]
FROM [BigData].[dbo].[Prices]
WHERE area_id in (0,1,2,7,9,19,5,6,25,26,27,28,4,13,14,16,15,17, 20, 36, 45, 42, 50, 44, 43, 74, 75, 41)
AND [datetime]>= cast(GETDATE()-290 as date)
AND [datetime]< cast(GETDATE()+0 as date)
) p
PIVOT(SUM([pris])
FOR [area_id] IN
([0], [1], [2], [7], [9], [19], [5],[6], [25],[26],[27], [28], [4], [13], [14], [16], [15], [17], [20],
[36], [45], [42], [50], [44], [43], [74], [75], [41]))
AS pvt
ORDER BY [datetime] asc ") -> prices
解决方案 #1
您可以使用 answer from kristang:(使用 as.is
选项调用 sqlQuery,在字符串中获取时间戳并在 R 中使用 as.POSIXct
转换列)。
解决方案 #2
但我认为更有效的解决方案是通过 SQL 表达式(SQL 服务器的示例)获取数字类型的日期时间值:
sqlQuery( "select convert(float, my_date)*3600*24 as my_date from ...")
并将其从数字转换为 POSIXct
:
df1$my_date <- as.POSIXct(df1$my_date, origin = "1900-01-01", tz = "UTC")
因为 POSIXct
本质上是数字,所以获取和类型转换比使用 RODBC 的普通 sqlQuery 运行得更快。 RODBC 从文本字符串转换每个时间戳(查看 sqlGetResults 函数中的 as.POSIXct 用法)。因此,即使 RODBC 返回完整的时间戳,该解决方案也是合理的。
P.S。如果您真的喜欢从文本转换,请参阅 fasttime 包中的 fastPOSIXct。
我正在使用 RODBC
在 R 中执行 SQL 查询。
查询给我的数据格式为
Date(POSIXct); var1:var29
%Y-%m-%d %H:%M:%S; numeric_values
问题是,一旦查询超过一定长度,6960 个左右的 obs 和 29-30 个变量,包括日期,传递给 R 的信息开始看起来像这样:
Date(POSIXct); var1:var30
%Y-%m-%d; numeric_values
因此,我丢失了 "%H:%M:%S"
信息。我不知道为什么。如果我减少变量的数量,我可以在这种情况发生之前增加时间长度。
ts-sql
在 windows-server 2007 上。(我相信)
SQL-R 中的调用示例:
sqlQuery(database, "SELECT [datetime], [0] as SYS, [1] as NO1, [2] as NO2, [7] as NO3, [9] as NO4, [19] as NO5, [5] as DK1,[6] as DK2, [25] as SE1,
[26] as SE2,[27] as SE3, [28] as SE4, [4] as FIN, [13] as DE, [14] as NL, [16] as FR, [15] as CH, [17] as AT, [20] as EE,
[36] as LT, [45] as LV, [42] as SI, [50] as IT, [44] as ES, [43] as BE, [74] as HU, [75] as CZ, [41] as UK
From
(
SELECT [area_id],[pris],[datetime]
FROM [BigData].[dbo].[Prices]
WHERE area_id in (0,1,2,7,9,19,5,6,25,26,27,28,4,13,14,16,15,17, 20, 36, 45, 42, 50, 44, 43, 74, 75, 41)
AND [datetime]>= cast(GETDATE()-290 as date)
AND [datetime]< cast(GETDATE()+0 as date)
) p
PIVOT(SUM([pris])
FOR [area_id] IN
([0], [1], [2], [7], [9], [19], [5],[6], [25],[26],[27], [28], [4], [13], [14], [16], [15], [17], [20],
[36], [45], [42], [50], [44], [43], [74], [75], [41]))
AS pvt
ORDER BY [datetime] asc ") -> prices
解决方案 #1
您可以使用 answer from kristang:(使用 as.is
选项调用 sqlQuery,在字符串中获取时间戳并在 R 中使用 as.POSIXct
转换列)。
解决方案 #2
但我认为更有效的解决方案是通过 SQL 表达式(SQL 服务器的示例)获取数字类型的日期时间值:
sqlQuery( "select convert(float, my_date)*3600*24 as my_date from ...")
并将其从数字转换为 POSIXct
:
df1$my_date <- as.POSIXct(df1$my_date, origin = "1900-01-01", tz = "UTC")
因为 POSIXct
本质上是数字,所以获取和类型转换比使用 RODBC 的普通 sqlQuery 运行得更快。 RODBC 从文本字符串转换每个时间戳(查看 sqlGetResults 函数中的 as.POSIXct 用法)。因此,即使 RODBC 返回完整的时间戳,该解决方案也是合理的。
P.S。如果您真的喜欢从文本转换,请参阅 fasttime 包中的 fastPOSIXct。