Couchbase N1QL 查询通常很慢

Couchbase N1QL query generally slow

我使用 couchbase 已经有一段时间了,但我从未真正体验过 couchbase 有多快。它相当慢。 我想知道我缺少什么设置。

我有一个具有以下规格的根服务器

我是 运行 Couchbase 4.5.1-2844 社区版 (build-2844) 已分配 7.05GB RAM

该存储桶有 1 个数据节点,使用 4.93 GB 和 3,093,889 个文档 存储桶类型为 "Couchbase" 缓存元数据设置为 "Value Ejection"。副本被禁用。磁盘 I/O 优化设置为低 。未启用刷新。

所有 300 万个文档看起来都与这个相似:

{
  "discovered_by": 0,
  "color": "FFBA00",
  "updated_at": "2018-01-18T21:40:17.361Z",
  "replier": 0,
  "message": "Irgendwas los hier in Luckenwalde?",
  "children": "",
  "view_count": 0,
  "post_own": "FFBA00",
  "user_handle": "oj",
  "vote_count": [
    {
      "timestamp": "2018-01-19 09:48:48",
      "votes": 0
    }
  ],
  "child_count": 3,
  "share_count": 0,
  "oj_replied": false,
  "location": {
    "loc_coordinates": {
      "lat": 0,
      "lng": 0
    },
    "loc_accuracy": 0,
    "country": "",
    "name": "Luckenwalde",
    "city": ""
  },
  "tags": [],
  "post_id": "59aef043f087270016dc5836",
  "got_thanks": false,
  "image_headers": "",
  "cities": [
    "Luckenwalde"
  ],
  "pin_count": 0,
  "distance": "friend",
  "image_approved": false,
  "created_at": "2017-09-05T18:43:15.904Z",
  "image_url": ""
}

查询可能如下所示

select COUNT(*) from sauger where color = 'FFBA00'

没有索引,它无法通过 couchbase-webapp 执行(超时),但有索引

CREATE INDEX color ON sauger(color)

结果最多需要 16 秒,但经过一些尝试后每次需要 2 到 3 秒

有6个不同的"Color-Strings"(比如"FFBA00"),查询结果是466920(占文档总数的6分之一)

上述查询的解释给了我这个:

[
  {
    "plan": {
      "#operator": "Sequence",
      "~children": [
        {
          "#operator": "IndexCountScan",
          "covers": [
            "cover ((`sauger`.`color`))",
            "cover ((meta(`sauger`).`id`))"
          ],
          "index": "color",
          "index_id": "cc3524c6d5a8ef94",
          "keyspace": "sauger",
          "namespace": "default",
          "spans": [
            {
              "Range": {
                "High": [
                  "\"FFBA00\""
                ],
                "Inclusion": 3,
                "Low": [
                  "\"FFBA00\""
                ]
              }
            }
          ],
          "using": "gsi"
        },
        {
          "#operator": "IndexCountProject",
          "result_terms": [
            {
              "expr": "count(*)"
            }
          ]
        }
      ]
    },
    "text": "select COUNT(*) from sauger where color = 'FFBA00'"
  }
]

一切都设置正确,但如此简单的查询仍然需要很长时间(并且没有其他任何东西写入或读取数据库,并且其 运行 所在的服务器完全空闲)

确保您没有主索引。这将消耗大量索引服务的内存。你说没有索引查询超时的声明让我觉得有一个主索引,否则查询会立即失败。

编辑:从 Indexing Best Practices 博客 post

添加有关主索引的更多详细信息
  1. Avoid Primary Keys in Production

Unexpected full primary scans are a possible and any possibility of such occurrences should be removed by avoiding primary indexes altogether in Production. N1QL Index Selection is a rule based system for now that checks for a possible index that will satisfy the query, and if there is no such, then it resorts to using the Primary Index. Primary index has all the keys of the documents, and hence query will fetch all keys from the primary index and then hop to Data Service to fetch the documents and then apply filters. As you can see, this is a very expensive operation and should be avoided at all costs.

If there are no Primary Indexes created, and the query is not able to find a matching index to serve the query, then the Query Service errors with the following message. This is helpful and should help you in creating the required Secondary index suitably:

“No index available on keyspace travel-sample that matches your query. Use CREATE INDEX or CREATE PRIMARY INDEX to create an index, or check that your expected index is online.”