在 Tarantool 中通过 SQL 查询不区分大小写的列

Querying case-insensitive columns by SQL in Tarantool

我们知道可以通过指定排序规则选项使字符串 Tarantool 索引不区分大小写:collation = "unicode_ci"。例如:

t = box.schema.create_space("test")
t:format({{name = "id", type = "number"}, {name = "col1", type = "string"}})
t:create_index('primary')
t:create_index("col1_idx", {parts = {{field = "col1", type = "string", collation = "unicode_ci"}}})
t:insert{1, "aaa"}
t:insert{2, "bbb"}
t:insert{3, "ccc"}

现在我们可以进行不区分大小写的查询了:

tarantool> t.index.col1_idx:select("AAA")
---
- - [1, 'aaa']
...

但是如何使用 SQL 来实现呢?这不起作用:

tarantool> box.execute("select * from \"test\" where \"col1\" = 'AAA'")
---
- metadata:
  - name: id
    type: number
  - name: col1
    type: string
  rows: []
...

这也不行:

tarantool> box.execute("select * from \"test\" indexed by \"col1_idx\" where \"col1\" = 'AAA'")
---
- metadata:
  - name: id
    type: number
  - name: col1
    type: string
  rows: []
...

有一个性能不佳的肮脏技巧(全扫描)。我们不想要它,是吗?

tarantool> box.execute("select * from \"test\" indexed by \"col1_idx\" where upper(\"col1\") = 'AAA'")
---
- metadata:
  - name: id
    type: number
  - name: col1
    type: string
  rows:
  - [1, 'aaa']
...

最后,我们还有一个解决方法:

tarantool> box.execute("select * from \"test\" where \"col1\" = 'AAA' collate \"unicode_ci\"")
---
- metadata:
  - name: id
    type: number
  - name: col1
    type: string
  rows:
  - [1, 'aaa']
...

但问题是 - 它是否使用索引?没有索引它也可以工作...

可以检查查询计划以确定是否使用了特定索引。要获得查询计划,只需将 'EXPLAIN QUERY PLAN ' 前缀添加到原始查询即可。例如:

tarantool>  box.execute("explain query plan select * from \"test\" where \"col1\" = 'AAA' collate \"unicode_ci\"")
---
- metadata:
  - name: selectid
    type: integer
  - name: order
    type: integer
  - name: from
    type: integer
  - name: detail
    type: text
  rows:
  - [0, 0, 0, 'SEARCH TABLE test USING COVERING INDEX col1_idx (col1=?) (~1 row)']
...

所以答案是'yes',在这种情况下使用索引。
再举个例子:

box.execute("select * from \"test\" indexed by \"col1_idx\" where \"col1\" = 'AAA'")

不幸的是,此比较中的排序规则是二进制的,因为忽略了索引的排序规则。在 SQL 中,只考虑在比较期间使用列的排序规则。此限制将在相应的 issue 关闭后立即解决。