SQLite sqlite3_step() 因大数据库而挂起
SQLite sqlite3_step() hangs with big database
我正在编写一个与嵌入式 SQLite
数据库一起使用的小型 Objective-C
库。
我使用的 SQLite
版本是 3.7.13
(用 SELECT sqlite_version()
核对过)
我的查询是:
SELECT ROUND(AVG(difference), 5) as distance
FROM (
SELECT (
SELECT A.timestamp - B.timestamp
FROM ExampleTable as B
WHERE B.timestamp = (
SELECT MAX(timestamp)
FROM ExampleTable as C
WHERE C.timestamp < A.timestamp
)
) as difference
FROM ExampleTable as A
ORDER BY timestamp)
基本上它输出按时间戳排序的行之间的平均时间戳差异。
我在一个包含 35k 行的示例数据库上尝试了查询,它运行了大约 100 毫秒。到目前为止一切顺利。
然后我在另一个具有 10 万行的示例数据库上尝试查询,它挂在 sqlite3_step()
占用了 100% 的 CPU 使用率。
由于我无法使用调试器进入 sqlite3_step()
,是否有另一种方法可以让我了解函数挂起的位置或调试日志,了解这里的问题是什么?
我还尝试在 100k 行数据库上运行我的库中的其他查询,没有问题,但这些都是没有子查询的简单查询也是事实。也许这就是问题所在?
谢谢
更新
这是 EXPLAIN QUERY PLAN
所要求的输出:
"1","0","0","SCAN TABLE ExampleTable AS A"
"1","0","0","EXECUTE CORRELATED SCALAR SUBQUERY 2"
"2","0","0","SCAN TABLE ExampleTable AS B"
"2","0","0","EXECUTE CORRELATED SCALAR SUBQUERY 3"
"3","0","0","SEARCH TABLE ExampleTable AS C"
"1","0","0","USE TEMP B-TREE FOR ORDER BY"
"0","0","0","SCAN SUBQUERY 1"
可以使用此列上的索引优化按 timestamp
值查找行:
CREATE INDEX whatever ON ExampleTable(timestamp);
并且此查询效率低下:ORDER BY 不影响取平均值的值,并且 B
和 C
中的 timestamp
值始终相同,因此您可以删除一个其中:
SELECT ROUND(AVG(difference), 5) AS distance
FROM (
SELECT timestamp -
(SELECT MAX(timestamp)
FROM ExampleTable AS B
WHERE timestamp < A.timestamp)
AS difference
FROM ExampleTable AS A)
我最终采用了这个解决方案:
CREATE TABLE tmp AS SELECT timestamp FROM ExampleTable ORDER BY timestamp
SELECT ROUND(AVG(difference), 5)
FROM (
SELECT (
SELECT A.timestamp - B.timestamp
FROM tmp as B
WHERE B.rowid = A.rowid-1
) as difference
FROM tmp as A
ORDER BY timestamp)
DROP TABLE ExampleTable
实际上我走得更远,我只对大量行(> 40k)使用这种策略,因为另一种策略(单个查询)对 "small" 表效果更好。
我正在编写一个与嵌入式 SQLite
数据库一起使用的小型 Objective-C
库。
我使用的 SQLite
版本是 3.7.13
(用 SELECT sqlite_version()
核对过)
我的查询是:
SELECT ROUND(AVG(difference), 5) as distance
FROM (
SELECT (
SELECT A.timestamp - B.timestamp
FROM ExampleTable as B
WHERE B.timestamp = (
SELECT MAX(timestamp)
FROM ExampleTable as C
WHERE C.timestamp < A.timestamp
)
) as difference
FROM ExampleTable as A
ORDER BY timestamp)
基本上它输出按时间戳排序的行之间的平均时间戳差异。
我在一个包含 35k 行的示例数据库上尝试了查询,它运行了大约 100 毫秒。到目前为止一切顺利。
然后我在另一个具有 10 万行的示例数据库上尝试查询,它挂在 sqlite3_step()
占用了 100% 的 CPU 使用率。
由于我无法使用调试器进入 sqlite3_step()
,是否有另一种方法可以让我了解函数挂起的位置或调试日志,了解这里的问题是什么?
我还尝试在 100k 行数据库上运行我的库中的其他查询,没有问题,但这些都是没有子查询的简单查询也是事实。也许这就是问题所在?
谢谢
更新
这是 EXPLAIN QUERY PLAN
所要求的输出:
"1","0","0","SCAN TABLE ExampleTable AS A"
"1","0","0","EXECUTE CORRELATED SCALAR SUBQUERY 2"
"2","0","0","SCAN TABLE ExampleTable AS B"
"2","0","0","EXECUTE CORRELATED SCALAR SUBQUERY 3"
"3","0","0","SEARCH TABLE ExampleTable AS C"
"1","0","0","USE TEMP B-TREE FOR ORDER BY"
"0","0","0","SCAN SUBQUERY 1"
可以使用此列上的索引优化按 timestamp
值查找行:
CREATE INDEX whatever ON ExampleTable(timestamp);
并且此查询效率低下:ORDER BY 不影响取平均值的值,并且 B
和 C
中的 timestamp
值始终相同,因此您可以删除一个其中:
SELECT ROUND(AVG(difference), 5) AS distance
FROM (
SELECT timestamp -
(SELECT MAX(timestamp)
FROM ExampleTable AS B
WHERE timestamp < A.timestamp)
AS difference
FROM ExampleTable AS A)
我最终采用了这个解决方案:
CREATE TABLE tmp AS SELECT timestamp FROM ExampleTable ORDER BY timestamp
SELECT ROUND(AVG(difference), 5)
FROM (
SELECT (
SELECT A.timestamp - B.timestamp
FROM tmp as B
WHERE B.rowid = A.rowid-1
) as difference
FROM tmp as A
ORDER BY timestamp)
DROP TABLE ExampleTable
实际上我走得更远,我只对大量行(> 40k)使用这种策略,因为另一种策略(单个查询)对 "small" 表效果更好。