RethinkDB

Question

我正在尝试编写最佳查询来查找所有不具有特定字段的文档。有没有比我在下面列出的示例更好的方法来做到这一点？

// Get the ids of all documents missing "location"
r.db("mydb").table("mytable").filter({location: null},{default: true}).pluck("id")

// Get a count of all documents missing "location"
r.db("mydb").table("mytable").filter({location: null},{default: true}).count()

现在，这些查询在 table 上大约需要 300-400 毫秒，文档约为 40k，这看起来相当慢。此外，在这种特定情况下，"location" 属性包含 latitude/longitude 并具有地理空间索引。

有什么办法可以做到吗？谢谢！

Answer 1

天真的建议

您可以使用 hasFields method along with the not 方法过滤掉不需要的文档：

r.db("mydb").table("mytable")
  .filter(function (row) {
    return row.hasFields({ location: true }).not()
  })

这可能会或可能不会更快，但值得一试。

使用二级索引

理想情况下，您需要一种方法使 location 成为二级索引，然后使用 getAll or between，因为使用索引的查询总是更快。您可以解决的方法是让 table 中的所有行都具有一个值 false 作为其位置的值（如果它们没有位置的话）。然后，您将为位置创建二级索引。最后，您可以随心所欲地使用 getAll 查询 table！

向所有没有位置的字段添加位置属性

为此，您需要先将 location: false 插入到没有位置的所有行中。您可以按如下方式执行此操作：

r.db("mydb").table("mytable")
  .filter(function (row) {
    return row.hasFields({ location: true }).not()
  })
  .update({
    location: false
  })

在此之后，您将每次添加没有位置的文档时都需要找到一种方法来插入location: false。

为 table 创建二级索引

现在所有文档都有一个 location 字段，我们可以为 location 创建二级索引。

r.db("mydb").table("mytable")
 .indexCreate('location')

请记住，您只需添加 { location: false } 并创建索引一次。

使用getAll

现在我们可以只使用 getAll 来查询使用 location 索引的文档。

r.db("mydb").table("mytable")
 .getAll(false, { index: 'location' })

这可能比上面的查询更快。

使用二级索引（函数）

您还可以创建 secondary index as a function。基本上，您创建一个函数，然后使用 getAll 查询该函数的结果。这可能比我之前提出的更容易和更直接。

创建索引

这里是：

r.db("mydb").table("mytable")
 .indexCreate('has_location', 
   function(x) { return x.hasFields('location'); 
 })

使用getAll.

这里是：

r.db("mydb").table("mytable")
 .getAll(false, { index: 'has_location' })

RethinkDB - 查找缺少字段的文档

RethinkDB - Find documents with missing field

query-optimization