couchbase 上的查询执行时间过长

Question

我是 couchbase 的新手，我正在使用 N1QL 做一些查询，但这需要很多时间（9 分钟）我的数据有 200.000 个文档，文档有嵌套类型，文档中的嵌套类型数量为 6.000.000，分布在 200.000 个文档之间，因此 UNNEST 操作很重要。我的一个数据样本是：

{"p_partkey": 2, "lineorder": [{"customer": [{"c_city": "INDONESIA1"}], "lo_supplycost": 54120, "orderdate": [{"d_weeknuminyear": 19}], "supplier": [{"s_phone": "16-789-973-6601|"}], "commitdate": [{"d_year": 1993}], "lo_tax": 7}, {"customer": [{...

我正在做的一个查询是：

SELECT SUM(l.lo_extendedprice*l.lo_discount*0.01) as revenue
from part p UNNEST p.lineorder l UNNEST l.orderdate o 
where o.d_year=1993 and l.lo_discount between 1 and 3 and l.lo_quantity<25;

数据有上述字段。但是执行需要9分钟。我只用我的电脑来做，所以只有一个节点。我的电脑有 16GB 的 RAM，集群 RAM cota 是 3.2GB，只有一个桶有 3GB。我的数据总大小为 2.45GB。我已经使用这里提到的计算：http://docs.couchbase.com/admin/admin/Concepts/bp-sizingGuidelines.html 来确定我的集群和存储桶的大小。我做错了什么或者这次对于这个数据量是正确的？

现在我创建的索引如下：

CREATE INDEX idx_discount ON part( DISTINCT ARRAY l.lo_discount FOR l IN lineorder END );

CREATE INDEX idx_quantity ON part( DISTINCT ARRAY l.lo_quantity FOR l IN lineorder END );

CREATE INDEX idx_year ON part( DISTINCT ARRAY o.d_year FOR o IN ( DISTINCT ARRAY l.orderdate FOR l IN lineorder END ) END );

但是数据库不使用它。

一个查询示例是：

SELECT SUM(l.lo_extendedprice*l.lo_discount*0.01) as revenue
from part p UNNEST p.lineorder l UNNEST l.orderdate o 
where o.d_year=1993 and l.lo_discount between 1 and 3 and l.lo_quantity<25;

再比如，我创建了索引：

CREATE INDEX teste3 ON `part` (DISTINCT ARRAY l.lo_quantity FOR l IN lineorder END );

并查询：

select l.lo_quantity from part as p UNNEST p.lineorder l where l.lo_quantity>20 limit 3

因为我删除了主索引，所以不执行。返回错误： "No primary index on keyspace part. Use CREATE PRIMARY INDEX to create one.",

Answer 1

您可以使用带有数组索引的 Couchbase 4.5（即将正式发布）。数组索引可以与 UNNEST 一起使用。它允许您索引数组的各个元素，包括嵌套在其他数组中的数组。

您可以创建以下索引，然后使用 EXPLAIN 来确保有一个 IndexScan 使用您想要的索引。

CREATE INDEX idx_discount ON part( DISTINCT ARRAY l.lo_discount FOR l IN lineorder END );

CREATE INDEX idx_quantity ON part( DISTINCT ARRAY l.lo_quantity FOR l IN lineorder END );

CREATE INDEX idx_year ON part( DISTINCT ARRAY ( DISTINCT ARRAY o.d_year FOR o IN l.orderdate END ) FOR l IN lineorder END );

Answer 2

阅读博客后：http://blog.couchbase.com/2016/may/1.making-most-of-your-arrays..-with-covering-array-indexes-and-more 我发现了问题：

如果您这样创建索引：

CREATE INDEX iflight_day 
       ON `travel-sample` ( DISTINCT ARRAY v.flight FOR v IN schedule END );

您必须在查询中使用相同的字母，在本例中为字母 'v'。

SELECT v.day from `travel-sample` as t UNNEST t.schedule v where v.flight="LY104";

最深层也是如此：

CREATE INDEX inested ON `travel-sample`
( DISTINCT ARRAY (DISTINCT ARRAY y.flight FOR y IN x.special_flights END) FOR x IN schedule END);

在这种情况下，您必须使用 'y' 和 'x':

SELECT x.day from `travel-sample` as t UNNEST t.schedule x UNNEST x.special_flights y where y.flight="AI444";

现在一切正常。

但是当我这样查询时又出现了另一个问题：

SELECT * from `travel-sample` as t UNNEST t.schedule x UNNEST x.special_flights y 
where x.day=7 and y.flight="AI444";

只使用像上面那样创建的日期索引。

  CREATE INDEX day 
           ON `travel-sample` ( DISTINCT ARRAY y.day FOR y IN schedule END );

只使用一个索引，有时'day'，有时'inested'。

couchbase 上的查询执行时间过长

too large execution time of queries on couchbase

benchmarking

nested

couchbase

n1ql