RethinkDB：在嵌套数组中的字段上创建索引（运行进入大数据场景）

Question

这是一个示例文档：

{
    "id": 12345,
    "links": [
        {
            url: "http://something.com",
            created: 1234567890987
        },
        {
            url: "http://somethingelse.com",
            created: 1234567891548
        },
        {
            url: "http://somethingweird.com",
            created: 1234567898555
        }
    ]
}

created 字段只是一个 unix 时间戳。我希望能够对 links 数组的每个项目中包含的 created 字段进行运行索引查询。我不知道该怎么做（或者如果可能的话）。例如，此查询甚至无法完成，因为 table（大约 700 万）中有太多文档：

r.db('test').table('very_large_table')
  .filter(function(row) {
    return row('links').filter(function(link) {
        return link('created').ge(1425293715379) 
    }).isEmpty().not()
  })
  .count()

EDIT 由于数据集太大，我放弃了聚合策略的实时查询。现在，我们不再尝试按要求查询这些数据，而是使用消息队列和数据聚合作业来压缩这些数据，以便它已经过处理并且查询速度超快。再次感谢大家的帮助！

Answer 1

澄清一下问题：这是一个性能问题。查询有效，但由于数据库中的文档数量，执行该查询很困难。

我认为您有两个选择：尝试优化查询或更改文档的架构。

http://rethinkdb.com/docs/secondary-indexes/javascript/

1.优化您的查询

您编写查询的方式似乎可行，但效率可能更高。在您的示例中，您遍历每个文档中的所有链接，并对每个文档中的运行一个 .ge。或许，您可以获得所有链接的 .max 或 .min，然后使用 .ge 与它们进行比较。我很确定这样会更快，但不确定是否足够快。

r.db('test').table('rethink_question_timestamp_index')
  .hasFields('links')
  .map(function (row){ return row('links').max('created')('created') })
  .filter(r.row.ge(1425293715379))
  .count()

2。更改架构

如果您没有在文档中添加链接，而是创建了另一个 link table，然后使用与 [=16= 中文档的一对多关系插入链接]，那么您可以 create an index on the created field which would speed up queries on links and then use a .join 将链接及其父文档连接在一起。

来自 RethinkDB 网站：

"Create a new secondary index on a table. Secondary indexes improve the speed of many read queries at the slight cost of increased storage space and decreased write performance."

更新

@AtnNn 是对的。在 sub-属性上创建二级索引是可行的方法！

http://rethinkdb.com/docs/secondary-indexes/javascript/

Answer 2

您可以像这样在 created 字段上创建多索引：

r.db('test').table('very_large_table')
 .indexCreate('links_created', r.row('links')('created'), {multi:true})

并像这样使用索引：

r.db('test').table('very_large_table')
 .between(1425293715379, null, {index:'links_created'})

在此处查看文档：http://rethinkdb.com/docs/secondary-indexes/python/

RethinkDB：​​在嵌套数组中的字段上创建索引（运行 进入大数据场景）

RethinkDB: Create Index on field in nested array (running into big data scenario)

rethinkdb

RethinkDB：在嵌套数组中的字段上创建索引（运行进入大数据场景）