在 Tarantool 中通过 SQL 查询不区分大小写的列
Querying case-insensitive columns by SQL in Tarantool
我们知道可以通过指定排序规则选项使字符串 Tarantool 索引不区分大小写:collation = "unicode_ci"
。例如:
t = box.schema.create_space("test")
t:format({{name = "id", type = "number"}, {name = "col1", type = "string"}})
t:create_index('primary')
t:create_index("col1_idx", {parts = {{field = "col1", type = "string", collation = "unicode_ci"}}})
t:insert{1, "aaa"}
t:insert{2, "bbb"}
t:insert{3, "ccc"}
现在我们可以进行不区分大小写的查询了:
tarantool> t.index.col1_idx:select("AAA")
---
- - [1, 'aaa']
...
但是如何使用 SQL 来实现呢?这不起作用:
tarantool> box.execute("select * from \"test\" where \"col1\" = 'AAA'")
---
- metadata:
- name: id
type: number
- name: col1
type: string
rows: []
...
这也不行:
tarantool> box.execute("select * from \"test\" indexed by \"col1_idx\" where \"col1\" = 'AAA'")
---
- metadata:
- name: id
type: number
- name: col1
type: string
rows: []
...
有一个性能不佳的肮脏技巧(全扫描)。我们不想要它,是吗?
tarantool> box.execute("select * from \"test\" indexed by \"col1_idx\" where upper(\"col1\") = 'AAA'")
---
- metadata:
- name: id
type: number
- name: col1
type: string
rows:
- [1, 'aaa']
...
最后,我们还有一个解决方法:
tarantool> box.execute("select * from \"test\" where \"col1\" = 'AAA' collate \"unicode_ci\"")
---
- metadata:
- name: id
type: number
- name: col1
type: string
rows:
- [1, 'aaa']
...
但问题是 - 它是否使用索引?没有索引它也可以工作...
可以检查查询计划以确定是否使用了特定索引。要获得查询计划,只需将 'EXPLAIN QUERY PLAN ' 前缀添加到原始查询即可。例如:
tarantool> box.execute("explain query plan select * from \"test\" where \"col1\" = 'AAA' collate \"unicode_ci\"")
---
- metadata:
- name: selectid
type: integer
- name: order
type: integer
- name: from
type: integer
- name: detail
type: text
rows:
- [0, 0, 0, 'SEARCH TABLE test USING COVERING INDEX col1_idx (col1=?) (~1 row)']
...
所以答案是'yes',在这种情况下使用索引。
再举个例子:
box.execute("select * from \"test\" indexed by \"col1_idx\" where \"col1\" = 'AAA'")
不幸的是,此比较中的排序规则是二进制的,因为忽略了索引的排序规则。在 SQL 中,只考虑在比较期间使用列的排序规则。此限制将在相应的 issue 关闭后立即解决。
我们知道可以通过指定排序规则选项使字符串 Tarantool 索引不区分大小写:collation = "unicode_ci"
。例如:
t = box.schema.create_space("test")
t:format({{name = "id", type = "number"}, {name = "col1", type = "string"}})
t:create_index('primary')
t:create_index("col1_idx", {parts = {{field = "col1", type = "string", collation = "unicode_ci"}}})
t:insert{1, "aaa"}
t:insert{2, "bbb"}
t:insert{3, "ccc"}
现在我们可以进行不区分大小写的查询了:
tarantool> t.index.col1_idx:select("AAA")
---
- - [1, 'aaa']
...
但是如何使用 SQL 来实现呢?这不起作用:
tarantool> box.execute("select * from \"test\" where \"col1\" = 'AAA'")
---
- metadata:
- name: id
type: number
- name: col1
type: string
rows: []
...
这也不行:
tarantool> box.execute("select * from \"test\" indexed by \"col1_idx\" where \"col1\" = 'AAA'")
---
- metadata:
- name: id
type: number
- name: col1
type: string
rows: []
...
有一个性能不佳的肮脏技巧(全扫描)。我们不想要它,是吗?
tarantool> box.execute("select * from \"test\" indexed by \"col1_idx\" where upper(\"col1\") = 'AAA'")
---
- metadata:
- name: id
type: number
- name: col1
type: string
rows:
- [1, 'aaa']
...
最后,我们还有一个解决方法:
tarantool> box.execute("select * from \"test\" where \"col1\" = 'AAA' collate \"unicode_ci\"")
---
- metadata:
- name: id
type: number
- name: col1
type: string
rows:
- [1, 'aaa']
...
但问题是 - 它是否使用索引?没有索引它也可以工作...
可以检查查询计划以确定是否使用了特定索引。要获得查询计划,只需将 'EXPLAIN QUERY PLAN ' 前缀添加到原始查询即可。例如:
tarantool> box.execute("explain query plan select * from \"test\" where \"col1\" = 'AAA' collate \"unicode_ci\"")
---
- metadata:
- name: selectid
type: integer
- name: order
type: integer
- name: from
type: integer
- name: detail
type: text
rows:
- [0, 0, 0, 'SEARCH TABLE test USING COVERING INDEX col1_idx (col1=?) (~1 row)']
...
所以答案是'yes',在这种情况下使用索引。
再举个例子:
box.execute("select * from \"test\" indexed by \"col1_idx\" where \"col1\" = 'AAA'")
不幸的是,此比较中的排序规则是二进制的,因为忽略了索引的排序规则。在 SQL 中,只考虑在比较期间使用列的排序规则。此限制将在相应的 issue 关闭后立即解决。